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PREFACE TO FOURTH EDITION 


This book was first published in 1901, and was then based 
on lectures delivered at the School of Economics in the five 
years following its foundation in 1895. Two further editions 
have been issued in which the text was revised without any 
important alteration, and an Appendix added dealing with the 
second approximation to the normal curve of error, and sub- 
sequently some pages of addenda were circulated. In the 
present edition Part T remains substantially as it was in 1901, 
except that Section III of Chapter III has been replaced by a 
new illustration, the chapter on Averages has been rearranged, 
a chapter on the measurement of dispersion takes the place of 
the former Chapter V, in Chapter IX the treatment of retail 
index-numbers has been reconsidered, and the second section 
of Chapter X has been recast. At the same time those parts 
yf the text which were out of date have been replaced by more 
modem material and the whole has been revised, but with as 
little alteration of the original as possible, since a revised 
version may by too much attention to detail destroy the balance 
of the original. On the other hand, Part II has been completely 
rewritten and considerably extended, both by the more detailed 
and extended treatment of theory and by the addition of a 
number of examples which illustrate the arithmetical use of 
the formulae and show the scope of the application of the theory. 
Fpr the convenience of those who possess the earlier edition, 
to whom the revised Part I contains little that is new. Part II 
iStesued separately ; while for those whose mathematical know- 
ledge is too slight to allow them to follow the treatment in 
Part II in its new form Part I is also issued separately.* But 
the two Parts together are essentially one book with a common 
index and with cross references from one^o another. 

The whole book is intended to form a general introduction 
to the theory and practice of statistics for aill persons whose 
business it is to handle them or to whom a general understand- 
ing both of the utilitjf of statistical results amd the limitations 
m '* The Parts are no longer issued separately. — April, 1926* • 
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of statistical investigation is important. It is not in aijy way 

intended to be a compendium of facts, and the tables inserted 
are only to aiierd illustrations of method, nor does it contain • 
any detailed account of published statistics ; but it is hoped 
that a reader will find himself in a position to understand, ancl 
above all to appraise and criticise, tabled and results published 
officially or otherwise relating to any of those very numerous 
subjects in which numerical knowledge of facts and their inter- 
relation is essential. No attempt is made to treat the history 
or bibliography of the subject ; there are many books extant 
in English, French and German which devote considerable 
space to the historical development of the methods and practice 
of statistics, with bibliographical references ; it seemed better 
here to omit these aspects altogether than to give them a 
cursory treatment. With these limitations it is hoped that 
the treatment in Part I covers adequately the great part of the 
methods and technique necessary for ordinary statistical work 
so far as this can be done without the use of any but the most 
elementary mathematics. The chapter on Interpolation, 
indeed, uses symbols which at first sight may look formidable 
to the non-mathematician; but in fact the use of finite 
differences and of Newton’s formula of interpolation is quite 
simple and the arithmetic involved very easy, and the great 
part of the chapter should be readily intelligible to those who 
have a school training in graphic algebra. 

Part II makes much greater demands both on preliminary 
training and on the power of following somewhat involved 
abstract reasoning. The actual knowledge postulated is that 
obtainable in a graduate course on the calculus, and the only * 
theorems not generally included in such a course are proved 
(in an abbreviated form) in the Appendix. In the first edition 
an effort was made to obtain the principal results without the 
use of the Calculus ; but as the subject has developed during 
the past twenty years, it has become necessary to abandon this 
attempt. The results that can be reached by algebra alone 
are no doubt important and useful, but there is so much of at 
least equal utility that can only be appreciated after more 
advanced 1 mathematical ..study that a student will save time 
in the end by becoming familiar with the elements of the 
infinitesimal calculus before he commences the serious study of, 
jn&thematical statistics. This opinion is confirmed by the Tjery ' 
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loose Reasoning often employed by writers who mak^too fadle 
use of*»the standard deviation, of curves of frequency and 
especially of fhe coefficient of correlation. Very great care 
has been taken in Chapter VI, Part II, to show as exactly as 
possible the meaning of the measurement of correlation by this 
coefficient and* its implications, and very much more might 
have been said before the subject was too thoroughly explored. 
No one should attempt to measure correlation till he has 
studied the theory closely and critically. 

Though the treatment in Part II is intended to serve as a 
general introduction to mathematical statistics whatever the 
subject-matter to which they are applied and to include defini- 
tions and explanations of the terms and measurements in 
common use, so as to be of assistance to students in all branches 
of science that involve group measurements, yet the order of 
treatment and in particular the worked examples are chosen 
principally with reference to the problems that arise in socio- 
logical and economic investigations, many of the examples in 
fact being taken from researches I have personally made in 
which mathematical treatment was only introduced so far as 
the line of inquiry called for it. In consequence of this the 
reader who is familiar with the writings of Professor Karl 
Pearson, Mr. Elderton, Mr. Hardy, Mr. Yule and Dr. Green- 
wood will notice that little emphasis is laid on applications to 
biological or to actuarial problems, while prominence is given 
to formulae and to methods which have received less attention. 

It v? unfortunately the case that a great deal of controversy 
has arisen with reference not only to the best methods of treat- 
ment, but also to the fundamental conceptions that underlie 
tfye application of the principles of mathematical probability to 
statistical observations. I cannot hope to have avoided con- 
Tftversial questions (for, indeed, if these were rigidly excluded 
there would be little left), but I have endeavoured to put in the 
foreground those methods and principles which command 
general acceptance and to omit those which* are the subject of 
dispute and are unessential. In one respect , however, a definite 
course is followed which will not meet with universal approval ; 
in my opinion the standard deviation has only limited utility 
unless it is connected with a table of probability by which the 
^chances of exceeding given multiples of this deviation can be 
calculated, and consequently I have emphasised the normality 
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Formulae for the standard deviation of the mean difference 
for frequency groups were worked out by me, wittf Mr. R»* G. D. 
Allen's help, in February 1936, and communicated to the 
International Congress of Mathematicians at Oslo in July. 
During the Congress Mr. H. Wold found a more direct way of 1 
obtaining them. Only that relating to «the noYmal curve is 
here given (p. 487). 

It is feared that the Supplement will not provide easy 
reading. It is deliberately compressed without omitting essen- 
tials. Here I can only say that an attempt has been made to 
simplify the treatment of problems in the articles or books in 
which they were first considered, and to avoid the use of mathe- 
matical methods which are unfamiliar to the non-expert. 

A. L. B. 

Mar ley Hill , 

January 1937 . 



CONTENTS 


PART I 

GENERAL ELEMENTARY METHODS 

* CHAP. PAO* 

I. Scope and Meaning of Statistics . . . . 3 

II. The General Method of Statistical Investigation . 14 

III/ Definition of Unit. Collection of Data . . 18 

SECTION I. THE POPULATION CENSUS . . . 20 

„ 2 . THE WAGE CENSUS . . . . 30 

„ 3. EXAMPLE OF AN UNOFFICIAL INVESTIGATION 39 

„ 4. STATISTICS OF ENGLAND^ FOREIGN TRADE 43 

IV. Tabulation 52 

GENERAL — M R . BOOTH’S USE OF CENSUS — AGRICULTURAL 
EARNINGS — U.S.A. WAGE STATISTICS — WAGE CENSUS — 
CHANGES OF WAGES 

V. Averages ........ 82 

(a) arithmetic; (b) weighted; (c) statistical co- 
efficients; (d) the mode; (e) the median; 

(f) geometric mean; (g) general 

VI. % Measurements of Dispersion and of Skewness. 

Application of Averages no 

VII. The Graphic Method — 

1. GENERAL PURPOSE 125 

2 . HISTORICAL DIAGRAMS I42 

3. COMPARISONS OF SERIES OF FIGURES . . . 149 

4. PERIODIC FIGURES 1 59 

5. LOGARITHMIC CURVES 169 

0 

VIU. Accuracy . , . . . . #178 

IX. Index-Numbers . 196 

X. Interpolation — 

SECTION I. GENERAL 214 

„ 2. ALGEBRAIC TREATMENT • » . f 22 I 

xi 



xii 


CONTENTS 


PART II 

APPLICATIONS OF MATHEMATICS TO STATISTICS 

CHAP. PAGE 

I. Introductory. Frequency Groups and Curves . 24^ 

II. Algebraic Probability and the Normal •Curve of 

Error 259 

III. The Law of Great Numbers 28 j 

IV. Applications of the Law of Error . . .312 

V. Empirical Frequency Equations .... 343 

VI. Theory of Correlation * 35 ° 

VII. Examples of Correlation 380 


VIII. Partial and Multiple Correlation . . . 398 

IX. Precision of Measurements of Averages, Moments 

and Correlation 409 

X. Tests of Correspondence between Data and Formula 426 

Appendix. Mathematical Notes 434 

1. Wallis’ Theorem for the Value of n . . . . 43^ 

2. Sum of Powers of Integers 434 

3. Stirling's Formula for m ! . . . . . 435 

4. The Euler-Maclaurin Theorem .... 436 

5. Sheppard’s Corrections ...... 439 

6. Moments of Second Approximation to the Curve of 

Error •. 441 

7. Ratio of Unweighted Averages .... 446 

8. Ratio of Weighted Averages 448 

9. Normality of Standard Deviations .... 440 

10. The Method of Least Squares 452 

11. Simpler Method for p. 429 4$4 


SUPPLEMENTS 

I. Kurtosis . \^ 455 

II. Correction for Mean Deviation .... *455 

III. Lorenz* and Pareto’s Curves .... 460 

IV. T11& Series . # 465 

V. The Logistic Curve 468 

VI. Transformations of the Normal (Surve . . . 470 a 

VJI. Correlation of Ranks ..... 4 477 

t c 



CONTENTS 


xiii 

* 

, WO» 

VIII. •pETERMINANTS. RECTILINEAR REGRESSION . . 478 

IX. Frequency of the Second Moment . . . 483 

X. Standard Deviations of Percentiles, etc. . . 485 

• XI. Standard Deviation of the Correlation Coefficient 488 
XII. The Mf^thod of Confidence Belts .... 489 
XIII. The Test of Goodness of Fit .... 493 




’ LIST OF DIAGRAMS 


PART I 

Fmcingfimp 

GRAPHIC METHOD OF FINDING THE MEDIAN, QUARTILES AND 

DECILES ......... 106 

•GRAPHIC REPRESENTATION OF WAGE STATISTICS • . .127 

DISTRIBUTION BY AGE OF MARRIED MEN, ENGLAND AND WALES, 

I 9 II 130 

TOTAL' VALUE OF BRITISH AND IRISH PRODUCE EXPORTED FROM 

THE UNITED KINGDOM, 1855-1906 I34 

GRAPHIC METHODS OF DETERMINING THE MEDIAN AND MODES . 138 

REVENUE OF THE UNITED KINGDOM, 1850-1905 . . Page 142 

IMPORTATION OF WHEAT AND WHEAT FLOUR, 1862-1906 . # 146 

TRADE OF BRITISH POSSESSIONS AND FOREIGN COUNTRIES . . 152 

CARRIAGE RATE AND FOREIGN TRADE 155 

FLUCTUATIONS OF EMPLOYMENT, 1855-1893 . . . .162 

GROWTH OF IMPORTS AND EXPORTS (NATURAL AND LOGARITHMIC 

scale) ......... 171 

MARRIAGE RATE AND EMPLOYMENT (LOGARITHMIC SCALE) . . 1 74 

PART II 

AVERAGES OF ARRAYS AND LINE OF REGRESSION . . Page 390 

^ftlE SKEW CURVE OF ERROR . ...... 443 

THE NORMAL CURVE OF ERROR .... facing page 454 

*/ 

# SUPPLEMENT. 4 

A. Kurtosis 456 

B. Mean Deviation . . . . . « 457 

C. Lorenz Curve ........ 460 

D. Transformation* of the Normal Curve . . • . 471 

E. Confidence Belts . . . . . . . /91 


xv 




PART I 


GENERAL ELEMENTARY METHODS. 




PART I. 


GENERAL ELEMENTARY METHODS. 


CHAPTER I. 

SCOPE AND MEANING OF STATISTICS. 

Very many definitions have been given of the word statis- 
tics, and each author who has written on the subject has 
assigned new limits to the field which should be D^don. of 
included in its scope. It will not be necessary •*»«»««• 
for the purpose of this book to discuss the merely verbal 
cSfferences involved, but only to explain what is intended by 
its title, and to consider the limits of the science which it is 
proposed to investigate. It will be useful, however, to mention 
some possible definitions. 

Statistics may, for instance, be called the science of counting. 
Counting appears at first sight to be a very simple operation, 

which any one can perform or which can be done t>. , 

•automatically ; but, as a matter of fact, when we countin «- 
cojjie to l ar ge numbers, e.g., the population of the United King- 
dom, counting is by no means easy, or within the power of an 
©dividual ; limits of time and place alone prevent it being so 
carried out, and in no way can absolute accuracy be obtained 
when the numbers surpass certain limits. Great numbers are 
not counted correctly to a unit, they are estimated ; and we 
might perhaps point to this as a division between ^ 
arithmetic and statistics, that whereas arithmetic 
attains exactness, statistics deals with estimates, “ /k " i * hmeUe - 
sometimes very accurate, and often sufficiently so for their 
jaurpose, but never mathematically exact. Statistics generally 
relate tp numbers so great that their estimation is beyond tBe, 

B2* 
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power of «an individual, and requires the co-operatipn of 
sharia an organised body of workers.* Though the 
u co-operative collection of numbers by several persons and* 
ewratin*. yjg mere addition of the results seem simply 
questions of arithmetic, yet in practice two difficulties soon 
occur. First, it is not easy to define tHte thing to be counted 
so explicitly that all the tellers shall admit and reject instances 
on the same principles ; for such simple objects as the numbe/ 
of rooms or stories of a house, a person’s age, even an indi- 
vidual, give rise to such complex questions of definition that 
it is often impossible to tell from a short description of a 
category exactly what items are included in it. Seeondly, 
numerical errors cannot be avoided when many workers are 
involved; for some among a large number of persons will be 
inaccurate, some unintelligent, some will not obtain complete 
information, and when their reports are compiled there will be 
occasional mistakes in copying and errors in tabulation. A 
total which is the result of the work of many hands will cer- 
tainly from one cause or another fall short of complete accuracy. 
But though all estimates of this nature are sometimes included 
under the term statistics, this definition at once is too wic^, 
and also does not bring out the distinctive nature of statistical 
method. 

It is better, in fact, to define statistics a -posteriori. In 
dealing with masses of figures, large numbers descriptive of 
statistic* *t i groups, series of totals or averages relating to 
m ** hod - different dates or places, it is found that* special 
methods become necessary — methods which depend on par- 
ticular properties of large numbers, methods which are suitable 
for describing complex groups so that they can be easily com- 
prehended, methods for analysing the accuracy of statements, 
for measuring the significance of differences, for comparing 8Re 
estimate with another. Those estimates to which these 
methods apply are within the scope of statistics; it is the 
study of these methods that is the object of this book. It is 
clear that, under dur tentative definition, statistics is not 
GcwMHtytf merely a branch of political economy, nor is it 
«*“*«*•» confined t® any one science. A knowledge of 
“* tho4 ‘ statistics is like a knowledge of foreign languages 
or of algebra : it may prove of use at any time under any. 
circumstances. 
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It inay be interesting to trace the connection oi statistical 
method’ with various branches of knowledge. To begin with 
the physical sciences: there are two points •in it. um in the 
which this method touches astronomy. The phy#ical • denc ~- 
method of least squares was introduced by an astronomer, 
anxious to choose the best of several slightly discrepant 
observations of the position of a star. In most physical 
qbservations several measurements are taken of the same 
quantity, and it is found that, however carefully they are 
made, they never absolutely agree; just as the averages 
obtained by different statisticians from the same series of 
sociological observations are generally not identical. From 
such a group of measurements it is necessary to deduce the 
most probable estimates; this is done by the application of 
the law of error, in the form of the method of least squares. 

The other point of resemblance of statistical to astronom- 
ical method is common also to geology and to most applied 
sciences. The course of scientific measurement profra^*™ 
has generally been to take first a rough observation •^curacy, 
of a quantity, such as the distance of the sun, the thickness of 
% stratum, the atomic weight of an element, the specific gravity 
of a substance; then, as information accumulated, as the 
precision of instruments increased and methods were better 
adapted, to make the measurement gradually more and more 
accurate. It is important to appreciate this development, 
for in the present state of our knowledge, many statistical 
measurements cannot be made with precision for want of data, 
and a critic is inclined to say that for this reason preliminary 
•estimates are valueless ; but from the scientific point of view 
th^ criticism is wrong, for a faulty measurement made on 
logical principles is better than none, if limits can be assigned 
V its possible error, and may lead to others with progressive 
improvement. 

Passing by the general resemblance of statistical investi- 
gations to all scientific experiments, we may* notice the use of 
statistics in biology. It was, perhaps, not recog- static, and 
nised before the publication of Professor Karl 
Pearson’s investigations,* that the whole doctrine oiVvolution 
and heredity rests in reality on a statistical basis. It is in 

• See The Grammar of Science , 1900, chap. x. seq., and the references 
therg given. 
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this direction that some of the most important new jvork in 
mathematical statistics is being done. It may.he worth while 
to sketch very briefly the nature of the problem. Out of a* 
great number of observations, say the measurements of the 
heights of a group of men, the type is fouryl — an average, 
about which all the measurements are ‘grouped according to 
some definite law. The problem is then to determine whether 
this type or the grouping about it changes, and in what way. 
The differences found in successive generations form the data 
on which arguments as to evolution and development are 
founded. The method applies equally to fossil remains, to 
zoological species, and to many other groups. If it Is neg- 
lected, many valid arguments lose a great part of their force, 
and theories are founded on personal impressions of phenomena 
instead of on scientific measurement. The work done in this 
direction becomes of immediate use to the student of social 
questions. The average wage and the grouping about it and 
the change in these quantities present precisely similar prob- 
lems; the correlation between the effects of (Afferent factors 
are calculated by the same mathematical formulae; in fact, 
these methods furnish the only accurate way of measuring 
numerical changes in complex groups. Much valuable infor- 
mation has been collected in anthropometrical laboratories, 
which has increased the statistician’s knowledge of facts and 
given birth to important theoretical principles. 

Meteorology has much in common with statistics. The 
chief measurements taken for the purposes of this science are 
statistics and of temperature, barometrical pressure, moisture 
matcoroioar. Q f the air, and force of the wind. One of the 
problems attacked is again that of finding the type from a 
group of observations, and of measuring its change. The 
tables which state the average temperature year by year are 
in many ways similar to those which the Registrar-General 
publishes of births, deaths, and marriages. Without the aid 
of statistical method, the averages obtained show mere numbers 
from which no logical deductions can be made. With the 
help of tjus knowledge, it can be seen whether the change from 
year to year is significant or accidental ; whether the figures 
show a progressive or periodic change ; r whether they obey any 
l|w or not. The problem is easily seen to be of importance fow 
• forecasting the future population and for many similar pur^ses. 
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We,are thys brought by a short step to the pfo vince to 
which statistics has sometimes been confined : the study of 
•demography. If in demography we include, not sua.ac md 
yierely the measurement of the numbers of the 
population, the birth, marriage, and death rates, the dis- 
tribution by age, by sex, and by locality, in fact, the figures 
which naturally come from the census and the Registrar- 
General’s returns; but include also, industrial and social 
measurements, of distribution of the population by trade, of 
income, wages, prices, production, foreign trade, transport, 
and so forth ; we have extended the limits of demography till 
it includes the majority of the statistical investigations directly 
interesting to students of sociology or of political economy. 
Without stopping to decide the exact limits of demography, 
we can quickly pass to another definition of statistics (so far 
as it concerns such students) on which it is wished to lay a 
certain stress : statistics is the science of the measurement of the 
social organism, regarded as a whole, in all its manifestations. 
In a monograph, after the fashion of Le Play, a 
single family is studied; the occupations and to'thc^u*" 
earnings of its members, the way these earnings 0fg ^” ** * 
are spent, and its economic position generally are 
set down ; but this study is not so far statistical. In demo- 
graphy we study the same quantities when groups of families 
are concerned; the number of families engaged in certain 
industries, and their average receipts, expenditure, and savings ; 
here wS have statistics. In the monographic method the indi- 
# vidual is everything ; in the statistical method, nothing. When 
we wish to obtain a measurement of the group, peculiarities 
of individuals receive no attention ; it is only when the same 
peculiarities are possessed by many persons that they become 
of importance. Statistics may rightly be called the science of 
averages. In the measurement of a complex group, say of 
incomes and wages, the exceptional artiste who can earn £100 
in an evening, and the inefficient labourer who can only make 
sixpence a day, affect only slightly the ge’neral average ; they 
are not entered in separate categories ; but the large jgroup of 
skilled artisans who earned before 1914 forty shillings a week, 
or of casual labourers wjio made less than fifteen shillings, are 
entitled to separate notice. The exact specification to be 
adopted is only a question of degree, which, differs with the. 
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nature of the particular investigation in hand. , The object of 
a statistical estimate of a complex group is, to pre'sent an 
outline, to enable the mind to comprehend with a single effort' 
the significance of the whole. To do this it is necessary to 
exclude rigorously any presentation of detailg, for the same 
reason that, in a painter’s rendering of' a tree, the individual 
leaves are not distinguished. The outline will be a little 
blurred; a little inaccurate; but it will be as distinct an4 
detailed as the mind has power to grasp it, or the eye to see 
it ; the impression will be rightly given. There is a very 
important principle involved in this method. The individual 
members of a group vary continually, the whole groujf varies 
very slowly. It is impossible to follow or measure the motions 
of separate atoms ; it is comparatively easy to state the laws of 
motion for a solid body. Great numbers and the averages 
resulting from them, such as we always obtain in measuring 
social phenomena, have great inertia. The total population, 
the total income, the birth and death rates, average wages, 
change very little; similar quantities relating to a single 
family change very fast. It is this constancy of great numbers 
that makes statistical measurement possible. It is to great 
numbers that statistical measurement chiefly applies. 

The relation of statistics .to political economy is a simple 
one. Professor Marshall says,* " Statistics are the straw out 
of which I, like every other economist, have to 

Statistics and J ... . 

political make the bricks." The statistician furnishes the 
aconomy. political economist with the facts, by which he 
tests his theories or on which he bases them. Since the econo- 
mist deals chiefly with phenomena relating to groups, and 
regards the individual only as a member of a group, it i* to 
statistics as the science of averages that he looks for his in- 
formation. When he is dealing with national economy, with 
the volume of trade, for instance, or the purchasing power of 
money, he is limited to pure theory, till statistics as the science 
of great numbers hs^s provided the facts. The chemist experi- 
menting in his laboratory is like the statistician ; the chemist 
theorising in his study is like the economist. Because of this 
relation it may be held'to be the business of the statistician to 
collect, arrange, and describe, like a careful experimentist, but 

— — — — — — 

r Evicfcmce to the Committee on the Census, 1890. 
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to dra*y no deductions; even in an investigation relating to 
cause add effect, to present evidence but not conclusions. As 
'a distinct operation, of course, the statistician may assume 
£he role of the economist, for the same man may well be quali- 
fied to conduct # the experiment and fit the theory. And just 
as a theoretical chemist will have little or no power unless he 
fully appreciates experimental methods and difficulties, even if 
he has not the manual dexterity to conduct them to perfection 
himself, so no student of political economy can pretend to 
complete equipment unless he is master of the methods of 
statistics, knows its difficulties, can see where accurate figures 
are pcfesible, can criticise the statistical evidence, and has an 
almost instinctive perception of the reliance that he may 
place on the estimates given him. 

The proper function, indeed, of statistics is to enlarge indi- 
vidual experience. An individual is limited to what he can 
himself see, a very small part of one division of a ^ , 
the social organism ; his knowledge is extended in indmduai 
various ways, by the conversation of his acquaint- e * penence ‘ 
ance, by newspaper reports, by the writings of experts. Accord- 
ing to his ability and power of judgment, he will be able to 
form a correct view of the numerical importance of groups of 
persons and things ; but it is in the highest degree improbable 
that he will not have been biassed by the peculiarities of his 
position, and that he will place his different items of informa- 
tion in the right perspective ; and he will not be able to gauge 
rightly the accuracy of his data. As soon as he begins to 
^ examine these points he is undertaking a statistical investiga- 
tion, and will very soon find himself involved in all the diffi- 
culties and problems from which a knowledge of statistical 
^method alone can disentangle him. This is the obvious 
answer to those who deny the use of statistics. A statistical 
estimate may be good or bad, accurate or the reverse; but in 
almost all cases it is likely to be more accurate than a casual 
observer's impression, and in the nature Oi things can only 
be disproved by statistical methods. * 

A chief practical use of statistics is to show relative impor- 
tance, the very thing which an individual is likely tc? misjudge. 
Statistics are almost f always comparative. The Statistics ar« 
absolute magnitude of a quantity is of little com P"»*>«- 
meaning to us till we have some similar quantity with wHicJi 
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to compare it. A statement of the number of pauper§*in the 

United Kingdom is valueless unless we know Jhe tot 31 popu- 
lation. A statement of the number of gallons of water supplied • 
per head to the people of East London is of little meaning to 
us till we know the quantity supplied to other towns. The 
average wage, shown in the Wage Censds, does not convey its 
full significance till we have similar computations for other 
countries or relating to other years. In the case of mo^t 
statistical estimates, it will be found that we need another for 
comparison before we can appreciate the meaning of the first. 

If the group of objects which we wish to measure is large, 
its enumeration will be beyond our unassisted efforts, of those 
official of any organisation at our command. Some 

atatiatics: investigations, indeed, have been successfully 

conducted by private organisations, for instance, those which 
resulted in Booth's Life and Labour of the People, Leone Levi’s 
Wages and Earnings, and Rowntree’s Poverty ; and the method 
of samples has also been used (e.g., in Livelihood and Poverty, 
by the present author and Bumett-Hurst) to reduce an inquiry 
to manageable dimensions; but in general the measurement 
of a part of the social body or industrial organism must 
undertaken by the central or local governments, if it is to be 
successfully carried out. The fact that this is the case explains 
the heterogeneity and the imperfection of the mass of statistics 
extant. A government primarily collects numerical informa- 
tion only in relation to its own functions. Thus the administra- 
tion must know the numbers of the population and the area of 
the country in gross and in detail for its own purposes. Large 
groups of figures come simply from the necessity of public * 
account-keeping. Many official figures are bye-produc4s ; 
for office purposes an account is kept of all transactions in 
which the government has a hand, and of industries subject* 
to special regulations; and the government publishes most 
of the figures which thus come in its way. To such causes 
have been due <tar knowledge of the statistics of income, 
education, imports, 'railways, mines, factories, and so* on. 
Though few figures are collected simply for scientific purposes, 
yet in many cases schedules issued for administrative ends are 
used at the same time for the reception of other information, 
of use chiefly to the sociological student ; much of the Census 
information comes under this heading. A view oU ttyise 
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figures*, relating to the United Kingdom, which *are easily 
accessible to th£ student, can be obtained by turning through 
the annual Statistical Abstract for the United* Kingdom, the 
Annual Abstract of Labour Statistics, and the Registrar-General' s 
Annual Report j in one or other of these, summaries of, and 
references to, most official statistics are to be found. 

It is clear that figures collected simply in connection with 
administrative purposes are not likely to be precisely those 
which are needed by the student of sociology or th«a 
political economy. Even where the wants of the - 

official and the student are nearly identical, the classification 
and tabulation may not meet scientific requirements. There 
has, indeed, been considerable progress in recent years, in 
the direction of amassing statistical information not absolutely 
needed by the administration, and much of the work of the 
Labour Department of the Board of Trade (now merged in 
the Ministry of Labour) was of this kind ; but very much more 
might reasonably be done, at an expense which would be almost 
negligible when considered in relation to the national income. 
Thus the census might be made, in part at least, quinquennial, 
and the body of workers, who are organised once in ten years 
to conduct it, only to be disbanded when the report is issued, 
might be made permanent and entrusted with the carrying out 
of other inquiries on a national scale. Market and retail 
prices of many staple commodities could be tabulated, ana- 
lysed and published. Movements of goods by rail could be 
tabulated in the same way as transport by water, and the 
anomaly that we know more of our foreign than of our home 
trade be removed. Records of home production need not be 
cc*i fined to agriculture, mining, and steel works, but extended 
on the lines of the Census of Production of 1907 till we know 
every year the output of the principal industries. Above all 
a central statistical office is needed which should co-ordinate 
all existing statistics and, working directly or through the 
appropriate Departments, aim at completing and perfecting 
a continuous statistical account of the rfation. It needs very 
little study of statistics or of political economy centra 
to feel the pressing need of more* and better “*"• 

co-ordinated information; illustrations of the gaps in our 
kn owledge are easily found. When dealing with our national 
inqpm* we can obtain statistics of wages, and of income subject 
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to tax; but for salaries below the exemption limit, |tnd for 
part of the income received from foreign investments, we are 
forced to rely*on educated guesses. For the change of the* 
purchasing power of money we know, thanks chiefly to the 
Economist and trade newspapers, the course of wholesale 
prices, but many interesting calculatiflns are brought to a 
standstill because of the imperfection of the records of retail 
prices. With regard to wages, we c.an estimate fairly accur- 
ately standard and average wages, but, in default of an 
industrial census, do not know how many persons are in 
receipt of each given wage, nor the relative numbers of masters 
and men. Till there is a public demand for such information, 
it will need a very enlightened government to spare the time, 
trouble, and the relatively small sums of money necessary 
for a systematic attempt to fill up these gaps ; but every one 
can do something towards this enlightenment, and in further- 
ance of this demand, by studying what has been done in other 
countries, and building up a knowledge of the science of 
statistical investigation. 

The absence of such a demand is perhaps due to a widely 
spread and not unreasonable distrust of statistical estimate'', 
Dtonat of crystallised in the common remark that " any- 
•utiadcs : thing can be proved by statistics.” This is to a 

great extent the fault of the criticising public themselves : 
they are always requiring and the newspapers always supplying 
information, which depends on a statistical basis, but for 
which good statistics are not to be found for one or other of 
, the reasons already indicated. The informant 

must perforce turn to inaccurate estimates, and 
the public has no knowledge or discrimination as to what 
estimates rest on satisfactory data, or indeed as to what 
quantities are capable of statistical evaluation. Again, figures 
which cover only part of the subject, such as the Wage Census 
average, or the Labour Gazette returns of unemployed, may be 
quoted as universal; mere estimates, made for quite other 
purposes, may be given as accurate and complete; and on 
such unreliable premises arguments are based, which naturally, 
by a judicious choice of* material, can be made to support any 
theory at pleasure. It will generally be found that the statis- 
tician, on whose authority such statements are supposed to 
based, is not to blame. Some of the common way% of 
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producing a false statistical argument are to quctfe figures 

without their context, omitting the cautions as to their incom- 
pleteness, or to apply them to a group of phenomena quite 
different to that to which they in reality relate; to take 
estimates referring to only part of a group as complete; to 
enumerate the events* favourable to an argument, omitting 
the other side ; and to argue hastily from effect to cause, this 
laft error being the one most often fathered on to statistics. 
For all these elementary mistakes in logic, statistics is held 
responsible. 

Perhaps statisticians themselves have not always fully, 
recognised the limitations of their work. At best they can 
measure only the numerical aspect of a pheno- umiutioMof 
menon; while very often they must be content •**•*<*- 
with measuring not the facts they wish, but some allied quan- 
tity. We wish to know, for instance, the extent of poverty, 
its increase or diminution : poverty we cannot define or 
measure, and we cannot even count the number of the poor; 
all we can do is to state the number of officially recognised 
paupers, and add perhaps some estimates from private sources ; 
byt this gives us no clue to the intensity of poverty in indi- 
vidual cases. Or we wish to obtain statistics of health : but 
the principal measurements made are of the death-rate and 
average length of life, and the prevalence of some diseases, 
very different matters. The statistician's contribution to a 
sociological problem is only one of objective measurement, 
and this is frequently among the less important of the data ; 
it is as necessary, however, to its solution as accurate measure- 
ments are for the construction of a building. 



CHAPTER II. 


THE GENERAL METHOD OF STATISTICAL 
INVESTIGATION. 

At first sight it will seem as if there were no method common 
to all statistical investigations, and indeed the processes differ 
so widely that it is not easy to outline a scheme which will 
include them all; but the following sequence is generally 
indicated * as of general application, and will serve at least 
to thread an examination of methods together : (i) the Collec- 
tion of Material, (2) its Tabulation, (3) the Summary, and 
(4) a Critical Examination of its results. The first three 
processes will be discussed in detail in the following chapters. 

It may be well to state what equipment is necessary for the 
student who wishes to learn statistical methods. In collection 
and tabulation common-sense is the chief requisite, 
knowledge and experience the chief teacher; no more than 
—“T 7 .” expertness in quite simple arithmetic is neces- 
sary for the actual processes ; but since, as we shall 
see immediately, all the parts of an investigation are inter- 
dependent, it is expedient to understand the whole before 
attempting to carry out a part. For summarising, it. is well 
to have acquaintance with the various algebraic averages, and 
with enough geometry for the interpretation of simple curves; 
though all the operations can be performed without the u^p of 
algebraic symbols. For criticism of estimates and interpre- 
tation of results, it is necessary to use the formulae of mort 
advanced mathematics, and it is obviously expedient to under- 
stand the methods by which these formulae are obtained to 
ensure their intelligent use. They are specially necessary for 
the comparison of complex groups, and for estimating the 
significance of a divergence from the average, or the deviations 
in a list bf periodic figures, and quite essential in dealing with 
correlation. 

• See, e.g., Dr. Bertil Ion's Cours dldmentatre de Statistiaue , to . which the 
present author is indebted for some of the treatment in the following pages 
• 14 * 
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fr) Information is generally collected by issuing blank 
circular^, forms’of inquiry, to be filled in either by a few officials 
or by many individuals, and the proper drawing on«u<m: 
up of this form is one of the chief tasks in a good <onm ! 
investigation. Before this form is issued it is necessary to 
formulate a complete scheme of the whole undertaking, and 
even to have some idea of what the resulting figures will be, so 
as to be able to arrange the details of the organisation on the 
right scale, and adjust the tools used to their purpose. As 
already pointed out, the object whose measurement is wanted 
is not in general exactly that which can be measured, and the 
measurable quantity nearest to it must be found ; e.g., when 
the average annual earnings of the working class were in ques- 
tion, the quantity first measured was the average weekly wage. 
Then some technical knowledge of the particular subject is 
needed ; and, if not possessed, a preliminary inquiry on a small 
scale may be necessary to show how to fit means to ends. 
The people who possess the information required must be 
discovered and interrogated at first hand. The questions put 
must be those which will yield answers in a form ready for 
tabulation, and the scheme of tabulation must nature of u» 
therefore be thought out beforehand. The ques- i u “ Uon *- 
tions must be so clear that a misunderstanding is impossible, 
and so framed that the answers will be perfectly definite, such 
as a simple number, or “ yes ” or “ no.” They must be such as 
cannot give offence, or appear inquisitorial, or lead to partisan 
answers, or suppression of part of the facts. The mean must 
be found between asking more than will be readily answered 
and less than is wanted for the purpose in hand. The form 
must contain necessary instructions, making mistakes difficult, 
but 9 must not be too complex. The exact degree of accuracy 
acquired, whether the answers are to be correct to shillings or 
pence, to months or days, must be decided. Every word and 
every square inch of space must be keenly criticised. A 
little trouble spent upon the form will save much inconvenience 
afterwards. . 

(2) In considering what method is to be adopted for tabula- 
tion, we must remember that the investigation is intended to 
furnish the answers to certain definite questions — ^ 

how many people, whitt wage, what price — and * ‘ 

each column must present some total which is relevant to thesfe 
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questions* The exact scheme employed will differ in Afferent 
inquiries. In the population census, much of tfie tabulation is 
almost automatic ; in the wage census, the bhst and simplest, 
way to show the grouping about the average wage in each 
occupation had to be specially devised ; in trade statistics tlfe 
number of different categories to be adopted and the limits 
of each raise difficult questions. In general, the scheme of 
investigation requires knowledge of certain groups; and the 
totals resulting from tabulation should show the numbers of 
items in these, so that after tabulation, instead of the chaotic 
mass of infinitely varying items, we have a definite general 
outline of the whole group in question. . 

(3) When the raw material is worked up to this point, skill 
of a different kind is wanted. From the numbers obtained, we 

Averaging and have to pick out the significant figures; so to 
•ummariaation. p rese nt the totals and averages as to give a 
true impression to an inquirer; to summarise briefly the 
information obtained; to concentrate the mass into a few 
significant averages, and to describe their exact meaning in 
the fewest and clearest words, for it is the result of this con- 
centration which will generally be used and quoted. To ^o 
this skilfully requires an acquaintance with the method oi 
averages and the use of diagrams. It may further be necessary 
to fill in unavoidable gaps in the figures in order to supply esti- 
mates for intermediate years ; this needs a study of the danger- 
ous method of interpolation. Finally, a verbal description of 
the process, its genesis and results, and an estimate of its 
accuracy must be written, and then the investigation is complete. 

(4) The student who has to make use of statistics should ncft 
be content to take the results of an inquiry on authority,, but 

criticism oi ought to acquaint himself with all these details of 
results. method. Before the results can be criticised, it 

is necessary to know the complete genesis of the figures; 
whether the whole field was covered; exactly whence the 
information tabulated was obtained; whether there was a 
possibility of bias , v how nearly the individual answers were 
correct; whether the informants really knew the facts they 
related, w and if they wsre likely to state them correctly. The 
published statement of the results should show clearly the 
whole scheme of collection so as to make this criticism possible ; 
e ‘in particular, specimens of the original blank forms should be 
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included, so that the reader can judge whether the original 
answers lead definitely and exactly to the tabulated results. 

‘ ^Internal evidence often leads to much useful® criticism. It 
can be seen whether the number of returns for each group is 
proportional to its importance, or if a specially important 
figure depends on only 'slight evidence. The continuity of the 
figures can be examined, and the causes of sudden gaps in- 
vestigated. The returns can be divided into sample groups, 
and the extent of the correspondence of these groups with the 
general result will often indicate whether the returns are 
sufficiently general. A careful study of the more minute 
tabulations may show within what percentage the final numbers 
may be expected to be correct. A critical examination of this 
kind will often show that the information obtained is insuffi- 
cient to lead to precise results, and then attention should be 
directed to estimating the magnitude of the effect of omissions 
and inadequacy of data. 

A most important function of statistics is to produce 
evidence showing the relation of one group of phenomena to 
another; for the information obtained is presumably intended 
as guide for action, the guidance is generally needed to show 
what actions are likely to produce certain desired effects, and 
this is best investigated by finding how such effects have been 
produced in the past. We have then to determine whether 
changes in one measurable quantity have produced changes 
in another ; a problem very often insoluble, but one on which 
most light can be obtained by the study of the relevant statistics 
in the light of mathematics, the mathematics of probability, 
ahd it is in this particular branch of mathematics that recent 
statistical progress has been chiefly made. 

Such questions, however important, are somewhat abstruse, 
arid presuppose a certain amount of technical knowledge which 
is not in the possession of the general student. The plan of 
this book is to postpone all questions requiring such technical 
or mathematical knowledge to the Second Part, and to confine 
our earlier discussions to problems needing *ho special training 
or equipment. 



CHAPTER III. 

DEFINITION OF UNIT . COLLECTION OF DATA . 

Preliminary . Definition of Unit, 

Almost the first question in the initiation of an investiga- 
tion is, What is to be counted?, and nearly the last question 
when the tabulation is completed is, What has been counted ? 
The answer to the former gives the preliminary definition, 
that to the latter shows how it had to be modified in practice. 
The essential difficulties of definition come, first, from the 
need of interpreting conceptions conveyed or obscured by 
ordinary words into entities capable of enumeration, and, 
secondly, from determining the things that can actually be 
counted which are nearest to the entities of which knowledge 
is desired. Thus we may be investigating overcrowding or loss 
Qu«sita and of work through unemployment. Overcrowding 
dftU - is expressed numerically in the relation between 
persons and room or air-space, and differs with the age and 
sex of the members of the household and the ventilation and 
light of the rooms. In practice, persons only can be counted 
(without detailed reference to their needs), and the number 
of rooms (a room being defined rather arbitrarily) or their 
cubic contents can be recorded. Loss of work is expressed 
numerically in the number of ordinary working days on whioh 
no paid work was done. In practice, those are counted as 
unemployed who satisfy certain formalities at trade uni&n or 
Labour Exchange offices, such as signing a register at a pstf*- 
ticular hour each day. The definition of “ number unem- 
ployed ” depends on the regulations relating to these registers, 
and among unemployed are included only those groups of 
persons who corns within their scope. “ Overcrowded ” in 
the usage of the Census reports means that the number of 
person^ enumerated in a tenement is more than twice the 
number of rooms in it, a room being defined so as to exclude 
bath-room, scullery, etc. * 

* It must be realised that the words descnbing statistical 

18 
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totals 01; averages, such as population, imports, tonnage of ships 

entered, average 'price, cost of living, occupied, wages, income, 

* capital , are technical terms, whose significance is* always more 
definite than that usual in conversation or writing, and may 
hive some essential difference from that in common usage. 
These terms are capable of exact definitions, which can only 
be ascertained from the original reports in which the totals 
are obtained, and it often happens that these reports leave 
serious ambiguities unsettled. The sections that follow in 
this chapter illustrate the examination of the raw material 
of investigations with a view to ascertaining the exact meaning 
of the totals obtained. 

It is necessary in stating totals or averages to be as explicit 
as is possible without too much verbiage, and to give definitions 
which are too complex for a simple heading in Explicit™., in 
juxtaposition to the table which contains them. 

Thus in coal production we should not speak of " output per 
worker,” but of “ number of tons of coal brought to the sur- 
face in the week beginning January 25th, 1920, in the aggregate 
of the coal mines of Great Britain, divided by the average 
nujnbcr of persons employed underground in that week,” or 
if this is too complex all these points should be clear from the 
context or sub-headings or foot-notes, and an explanation 
should explain how the average number of persons employed 
was computed. 

A percentage should never be given without a phrase 
showing, on what it is measured. Thus if the price of some 
commodity was £80 at a previous date and is £100 now, the 
increase is 25 per cent, of its earlier price and 20 per cent. 
of its^present price. If it now fell 25 per cent, of its present price 
it would reach £75 ; but if it fell 25 per cent, of its earlier price, 
if would return to £80. If wages are raised four times by 
10 per cent, of a standard, starting at that standard, the wages 
are 100, no, 120, 130, 140 per cent, of the standard; but 
the increases in each period measured as percentages of the 
wage at the beginning of that period are respectively 10, 9-1, 
8’3, 7-7 approximately. 

A useful way of ensuring explicitness in a complex defini- 
tion, of special importance in schemes of tabulation, can be 
illustrated as follows. In a table presented to Attribute or 
the Income Tax Commission we find the sum ch * , * cttn «<i«. J 

• » C2* 
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£1,970, oop, 000 as the total of taxable income, explanations 
being given in introductory notes. The definition of this 
total may be "exhibited thus : — 

A. Income. 

B. Known to the tax commissioners. , 

C. As defined by the laws and inst*ructions for assessment. 

D. Less allowances for wear and tear, etc. 

E. Of persons and corporations in the United Kingdom 

and of non-residents so far as they are subject 
to tax. 

F. Assessed for the fiscal year 1918-19. 

• 

Each of the six phrases expresses a characteristic or attribute 
possessed by every unit in the total, and the exact definition 
of these attributes leads to the definition of the total, and to 
the answer to the question “ what has been counted ? ” 

We should generally include as characteristics, the fact of 
record (B), a date (F) and a place (E). 

Section i. — The Population Census. 

The population census will provide good illustrations of Jhe 
principles laid down in the l^st chapter, both because we shall 
be at first on familiar ground, since every one 
' knows its scheme, purpose, and details, and because 
the form of inquiry used for the collection of the original data 
brings out very prominently the difficulties met with in detailed 
statistical investigations. ( 

The first thing to be considered is the exact object for 
which the census is undertaken. It is for demographical pur- 
poses ; to supply information as to the nunjbtrs 
its object. an( j j oca j distribution of the population, the 

numbers of each sex and age, their so-called civil conditidh 
(i.e., whether single, married, or widowed), and their nation- 
ality. This is the minimum information necessary for adminis- 
trative purposes. In addition to these facts there are very 
many others which- the statesman and the economist wish to 
know about each member of the population, and the census 
form is ttie only means, in England of collecting universal data ; 
the question as to which of these shall be investigated and 
The choice of which neglected, is decidfed more by expediency 
<• questions, than on principle. Of these desiderata tlje follow- 
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ing may, be mentioned : the size and structure of the family, 
its positfon in tjhe social scale, the economic position of its 
head ; the nature of employment of its members, the wage or 
income of each member and of the family as a whole, the rent 
and size of their Jiouse, their educational condition, the ages at 
which they commenced’ or retired from work, their migrations, 
their combination in religious or other bodies, and their infirmi- 
ties. It is clear that some of this information must be dis- 
pensed with, if the form is not to be overcrowded, and if the 
tabulation is to be finished in any reasonable time'; and an 
examination of the general nature of the questions which can 
suitably be put will show how the necessary selection is made. 

First, the questions must be those which the informant is 
able to answer. Now, if the questions were only to be put 
to educated and methodical persons, doubtless a ammt 
full account could be given of the family migra- to amw ' r - 
tions and of the ages at which each member had been at work ; 
but the peculiarity of the census is that it is universal, and 
the questions must be such that the least educated and most 
unthrifty householder shall be able to answer ; : n many cases 
such facts would have been unrecorded and forgotten. 

Secondly, the questions must be perfectly definite, so that 
there can be no doubt as to what the right answer should be. 
The only answers which are of value to the 
statistician are “ yes,” “ no,” or a simple number, 
or a definite place or date or the use of a word that has a 
precise ‘meaning. Adjectives and adverbs such as many, 
often, partly, etc., bear different numerical meanings to 
different people, and, though they may express fairly clearly 
the position of an individual, are nearly useless for tabulation,* 
which is their only purpose so far as the census is concerned, 
'fhus the question as to education would have to be, not 
“state whether well, moderately, or badly educated,” but 
"state at what age school was left,” or “ how, many years at 
school ? ” But even if such questions were not excluded by 
our fifst test, by the forgetfulness of the informant, the state- 
ments given would be of little practical value, and vejy often 
incorrect. An inquiry as to wage and 1 income could not be 
made sufficiently definite without so many questions as to 
require a form to itself ; for wages, as we shall see when con-, 
• But see p. 121, infra. * 
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sidering the Wage Census, require very careful definition, and 
many subsidiary questions must be put to get % proper*estimate ; 
the simple qfiery, " what is your weekly wage or annual in 1 
come ? ” would be answered on so many varying principles 
that the result would be valueless. f 

Thirdly, the questions must be such as will be answered 
truthfully and without bias. There is hardly a demand on 
the census form which would not be excluded, if 
vend*. this rule was too rigorously enforced, as we shall 
see immediately. Perhaps the most difficult in this respect 
is the question, Employer or employed ? For though there are 
many cases in which a man is both employer and employed so 
that this question should be excluded by our second test, many 
persons consciously exaggerate their social importance by 
erroneously replying the former. Questions relating to social 
position must generally be excluded by this rule. 

Fourthly, the questions must be those which will be answered 
willingly, and must therefore not be inquisitorial, or such as 
Rducfencto to raise apprehension of a change of law or an 
imposition of taxes. Questions as to membership 
of trade unions, or of friendly societies, or as to insurance, 
would be thought inquisitorial. Many would refuse to state 
their incomes, holding it to be no one’s concern but their own. 
Questions as to rent might be regarded as possibly leading 
to taxation. Questions as to religion are badly answered, as 
was shown in the evidence before the Census Committee of 
1890,* and should be excluded in England by each*of these 
four rules. Some persons do not know what their religion 
should be named, others would find the question indefinite, 
others would deliberately answer wrongly, and many not at all. 

The questions on the census form f not excluded on one 
or other of these grounds are Nos. 1, 2, 3, 4, 14 and 15 ; these 
are fairly definite, and householders are generally able and 
willing to give correct answers to them. Question 5 may be 
inaccurately answered in cases of divorce, separation, or 
irregular unions. Questions 6, 7, 8, 9 were first introduced in 
1911, a^Ld though there were many inaccuracies, the answers 
have given important new information. With regard to 
questions 10 and 11, there has always been difficulty in dis- 

* * Report of Committee* on the Census , 1890 (C. — 6071). 

t Facing thi 9 page. 
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tinguisljing between a classification according to th<* nature of 

the wo£k a person is doing ( e.g ., as a clerk or a carpenter) 
and a classification according to industries ( S.g ., where the 
clerk and carpenter are employed by a textile firm) ; in 1911 
questions 10 and n were devised with a view to making a 
double tabulation possible. Question 12 has failed to give 
complete information as to the status of a worker, and ques- 
tion 13 is inadequate for many cases. No provision is made 
for persons who follow two equally important occupations. 
Question 16 is not definite and leads to no important results 
A further discussion of the merits of some of these is to be 
found*in the Report of the Committee already mentioned ; * 
here it is only intended to indicate the general grounds of 
inclusion or exclusion. 

So far we have not discussed the important question as to 
who should fill in the form. If, as in the English Census, 
it is to be filled in by the householder, the ques- F iiiin g up of 
tions must be much simpler in matter and words the form, 
than if it is to be filled in by an official teller. In the latter 
case the form may be much more complicated, the questions 
more inquisitorial and such as might lead to indefinite answers 
on the part of ignorant people ; for the teller would insist on 
an answer, be able to exclude those obviously wrong, and 
cross-question till the indefinite answers were so altered as to 
allow definite tabulation. In a great and complex under- 
taking like the Census, where many tellers must be impressed 
for a short period, their instructions and the general plan must 
be sufficiently simple; but as the extent of an inquiry con- 
1 tracts, the tellers can receive more complete instructions, and 
tho information requisitioned may be more complex. This is 
of most importance in connection with columns 10-13. 

The general shape and appearance of the sheet need atten- 
tion. If the structure of the family is to be shown, the answers 
are best given on a single sheet, which must shape of wank 
contain enough lines for the largest ordinary form - 
household, so that the trouble of fastening together of many 
couples may be avoided, and tabulation not be hindered. • The 
spaces must contain plenty of room foi answers in uneducated 
handwriting, without making the whole so large as not to lie 


% 
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• See also the Statistical Journal, 1908, p. 496, and 1920, p. 134. 
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easily on %desk. The instructions must be distinct and, visible. 

and placed in close connection with the answers ; to* further 
this, a skilful ifte may be made of capitals, italics, and different 
founts of type. On the form facing p. 22, those in use are 
roughly reproduced in miniature. 

The form should always show for wfiat purpose the figures 
are collected, and how they will be used, in order to enlist the 
PuipoMtobe support of the informant and allay misappreheiv 
•hown. sion. The extent to which this should be done 
depends a good deal on whether the filling-up is compulsory, 
as in the population census, or voluntary, as in the wage 
census. In the case before us no preamble is necessary, since 
every one knows the main features of a census, and most are 
willing to further its objects ; but it must be shown that the 
inquiry is sanctioned by Parliament, and that compliance is 
compulsory. This is done on the back, on the fold which is 
outside before the form is opened ; and even though penalties 
are threatened against absence of or falsification of returns, the 
last sentence on the back and a statement on the front of the 
form guarantees the informant against injurious or personal 
use of his answers. Where information is voluntary, a careful 
letter should be printed and circulated with the form, per- 
suading the informant to give his assistance. 

While the main part of the form is filled in by the house- 
holder, other parts are filled in by the officials, and with very 
SuiatdUry little trouble a good deal of subsidiary information 
information. can b e co u ec t e d in this way. On the outside the 
Registration district and sub-district, enumerator’s district and 
the postal address are written, from which the numbers can* 
be tabulated for any of the areas required. The teller could 
also, as he took the form, enter the number of stories to a 
house, which is not done in the English Census, and other 
information as to the style of house and street might be 
endorsed. In a % more intensive investigation, expert assistants 
could be trusted to come out of a house with an accurate 
knowledge of many fliteresting details. 

We can now proceed to the individual criticism of the form 
in the light of the rule% suggested above. In the first place, 
Uiws »nd even the arrangement of columns is not perfect, 
column*. To labourers who are not in the habit of writing 
at all, and who Jiave (to judge from election poster*) tc^be 
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instructed how \o put their mark in the right place on a ballot 
paper (mlany papers being destroyed simply through ignorance), 
^this arrangement of horizontal and vertical coluhms would be 
confusing, and without help they would not gather at all what 
tfiey were to do # . They would fill up more easily a paper in 
which the answers were c to follow the questions immediately : — 

State your Name 

State your Age 

State your Sex 

'Unmarried, Married, or Widowed 

and so on. 

This form, however, could only be used if a separate paper were 
to be filled in for every individual, children and all, as is the 
case in France. 

The first question, which for the general purpose of the 
census should be the most definite of all, leaves some room for 
doubt. What constitutes "passing the night” critionn of tb« 
in, the case of a night-watchman returning at ,u “ ti0 “- 
4 a.m., or of a printer at 2 a.m. ? How is the householder 
to know whether any of his establishment are returned else- 
where? Since too many instructions only lead to confusion, 
the tellers should be specially taught the answers to such 
questions. 

The»very meaning of the phrase “ population of a district ” 
is open to much doubt. In France “ la population de fait," 
Much consists of all present in the given district Meaning of 
at t^e given moment, is distinguished from " la popuUtion - 
population de droit,” which consists of all usually resident in the 
district, including those temporarily absent, and excluding those 
only momentarily present, and from “ la population munici- 
pale,” which is “ la population de droit,” less prisoners, hospital 
patients, scholars resident iq, schools, members of convents, 
the army, and so on.* The English Census has counted only 
"la population de fait.” In the United States in 1890* we 
find a " constitutional population,” which excludes residents 
in Indian Reservations, the Territories, and the District 
of Columbia ; the " general population,” which includes in 


See Bertillon, ibid . , p 146. 
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addition the Territories (except the Indian Reservation Indian 
Territory, and Alaska) ; and the " total population*," which 
includes all excluded in the former.* For 1910, the population'' 
as generally quoted is that of continental U.S.A., viz., forty- 
eight States (including Arizona and New Mexico, formerly 
territories) and the District of Columbia. We find also a total 
which includes Alaska, Hawaii, Porto Rico, and the army and 
navy abroad, and another total which adds to these the esti- 
mated population of the Philippines, Guam, Samoa, and the 
Panama Canal zone. For the apportionment of taxes the 
population of the District of Columbia and Indians are sub- 
tracted from the continental population. Notice that the 
Channel Islands and the Isle of Man are included in the English 
Census enumeration, but not in the total generally quoted. 
Also an account of soldiers and seamen at sea or abroad is 
given in a table, but they are not included in the total. 

It is possible to find difficulties in filling up each of the 
columns, owing to ignorance or ambiguity. For illustration, 
consider how column 2 should be filled in in the case of a cousin 
who was a "paying guest,” or a relation who was a visitor; for 
column 5, is a divorced person single or a widower, and what* of 
a woman who is doubtful whether her husband is lost at sea ? 

It is well known that columns 3 and 4 are wrongly filled 
in for two reasons — one, that elderly people often do not 
know their ages accurately and enter them to the 
nearest round number, so that the returns con- 
gregate at 40, 50, 60 : the error thus arising is eliminated by 
tabulation in the groups 35-45, 45-55 years, etc., and for more 
minute tabulation the groups 3-7, 8-12, 13-17, etc., are sug- 
gested : the other is that many women habitually enter *their 
ages too low; in this case also the Registrar-General is able 
to deduce nearly correct totals. 

It is to be noticed that, since the ages stated are those 
" last birthday," the age will on the average be given six 
months too low, and, in fact, the ages given as 17, e.g., should 
be scattered nearly Uniformly over the months to the eighteenth 
year. , 

The most important criticisms of the census-schedule are to 
be made on columns 10-13. It will not be expedient here to go 

* * Willcox : Area and Population of the United States at the XI . Census , 

a book which gives a very useful criticism of the accuracy of th^most 
elementary data ol statistics. 9 
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into alj the questions raised before the Committee on the 
Census is reganjs an industrial census. While there n [|n|i 
can be little doubt that a thorough census of occu- 
pations would be best undertaken separately, and on some- 
what different principles from the population census, it is 
certainly better, till opinion is ripe for so radical a change, to 
include in the present census the best questions we can as to 
qccupations, than to omit them altogether in despair of accurate 
results. In any case, a census of occupations ought to be 
co-ordinated with the general population census, otherwise 
great difficulties of interpretation arise. Some of these may 
be seen in the attempt to reconcile the statistics of the number 
of persons employed in the Report of the Census of Production 
(Cd. 6320, pp. 8-10) with the statistics of persons occupied 
according to the Census of Population. 

The objects aimed at, which we must always keep in mind 
when criticising special questions, are two : to find the number 
employed in each trade and industry, that is, so to say, to 
form vertical divisions ; and to find the number in each kind 
or grade of employment (labourer, artisan, employer, etc., or 
sfnith’s striker, carpenter, weaver, etc.) in horizontal divisions; 
so that the tabulation may give some such result as : — 

Textile Industries. 

Cotton. Wool. Linen. Totals 


Employers 
* Managers 
Cletks 

1 Overlookers 
Spinners 

Weavers 

• 

Labourers 


Children 
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The necessary minimum of information woqld be given by 
such answers as • 

Legal — Solicitor — Managing clerk. f 

Mining — Coal — Hewer. 

Metal-worker — Iron — Smith’s striker. 

Now the simple instruction, “ State your occupation,” would of 
course not lead to information of this sort. The coal-hewer 
would simply say miner; the clerk, managing clerk; the 
striker, very likely smith. To explain what is wanted and 
avoid mistakes, the informant is referred to the back c of the 
form, half of which is devoted to instructions relating to these 
columns. These are lucid, carefully picked out with capitals 
and italics, comprehensive, brief and to the point. No one 
who wishes to fill in the form rightly, and is sufficiently educated 
to understand simple instructions, can easily go wrong. Yet 
it is probable that these instructions are in very many cases 
neither read nor followed ; and this is very important in con- 
nection with the general study of blank forms of inquiry. 
Forms issued to people uninterested in the object in view will 
generally be filled in with the least possible expenditure of 
time and intelligence. Hence two courses are open : to reduce 
the question to the simplest possible form, and make the best 
of the result ; or not to allow the informants to write in their 
own answers, but to take them viva voce by means of a teller, 
who has mastered the instructions, and has the necessary legal 
force behind him to compel information. The latter course 
entails time and expense. 

The result of the present system of inquiry, combined with 
a faulty method of tabulation, which it to some extent makes t 
necessary, is that we have no reliable census of occupations for 
the United Kingdom. The present figures break down both 
from faulty datf and from insufficient tabulation directly we 
attempt to make some of the important calculations depending 
on them. 

Aa attempt was made in 1891 to correct to some extent 
our ignorance of the relative numbers of unskilled and skilled 
Th* rauit of th« labourers, employers and employed, by the ques- 
tj on now j n column 12. The headings were not 
a model of clearness ; there was not the ordinary imperaVve 
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" stat^” or "write/* nor was one told on the frpnt of the 

form wkether ip write Ye£ or No or to make a mark in the 
appropriate column, nor is the distinction between the three 
headings a perfectly definite one; but still one was hardly 
prepared for the following statement in the report : * — 

" In numerous instances, no cross at all was made ; in many 
others, crosses were made in two or even all three columns, and, 
even when only one cross was made, there were often very 
slrong reasons for believing that it has been made in the wrong 
column. Oftentimes this use of the wrong column can scarcely 
have been other than intentional ; being dictated by the foolish 
but very common desire of persons to magnify the importance 
of their occupational condition. This desire must have led 
many subordinates to return themselves as employers rather 
than as employed, for it is only on this supposition that we can 
account for the otherwise unintelligible fact that, under several 
headings, there are actually, according to the returns, more 
employers than employed, more masters than men. . . . We 
hold (these returns) to be excessively untrustworthy, and shall 
make no use whatsoever of them in our remarks/* 

, The questions have, however, continued to be inserted and 
the numbers tabulated, and statisticians have used the results 
with a certain confidence. 

This attempt and its results are of the greatest importance 
to all who try to draw up forms of inquiry. 

Before leaving the subject, it should be mentioned in passing 
that wp cannot deduce directly from our census the number of 
persons dependent on a particular trade for their living ; that 
«is to say, the number of employers, their families (not other- 
wise returned) and domestic servants, and the number of 
employes and their dependent families. This, the most 
Important total for estimating the relative importance of 
different trades of the country, is not tabulated, though such 
tabulation has been found possible in other countries, and 
we are dependent on the estimates of statisticians for such 
totals, f 

To see how the information given by the answers oq the 
census schedule can be worked up-’ into detailed* specific 
numbers, it is only necessary to look at the diagram and 

* General Report on the Census of 1891, p. 36 (C. — 7222 of 1893). 
f Sc\e Booth in Statistical Journal , vol. xlix. 
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table prefixed to each of the sections relating to special.trades 

in Mr. Booth's Life and Labour of the Peopfo (e.g., *vol. v., 

p. 46).* 

Statisticians have generally to work with material provided 
for them ; their first task is to understand exactly the defini- 
tions under which the data were obtained an<f the limitations 
of the tables published. In skilled hands quite faulty compila- 
tions have often been found to yield accurate results of great 
interest. 


Section 2. — The Wage Census. 

The main differences in method between the wage census, 
as taken in 1886 and 1906, and the general population census 
are — (1) That the filling up the forms in the wage census was 
voluntary; (2) that their correct filling up required a higher 
degree of intelligence and education. As before, we must 
consider first the object which the wage census was intended 
Tb* object to fulfil : it was to describe the earnings of the 
people of the United Kingdom, to compare the 
rates of wages trade by trade, and to find the relative numbers 
earning at each rate. What is the best quantity to measure 
with this object in view ? As a preliminary question, should 
n» unit of we take the day, week, or year as the unit of 
time? Clearly we shall not be able to compute 
weekly wages if we only obtain daily, for the week’s work 
varies from four to seven days in different occupations^ The 
week’s wage is a more definite quantity; but the simple 
comparison of weekly wages in different trades will be decep-. 
tive, because most trades are busier at one season of the jear 
than at another, and in many the difference between season 
and season is very great ; in any particular week, then, we may= 
be comparing the best season of one industry with the worst 
of another. To avoid this error, and because we do not know 
how many full weeks’ wages are obtained in a year, except in 
a few non-intermittent trades, it would seem best to take the 
year as unit ; but the direct calculation of an individual’s 
anndcd e&rnings is practically impossible. The employer is not 
acquainted with this sum, for in large establishments the hands 
are continually changing, and one mnwwill be paid by two or 


• See p. 57, infra . 
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more masters in the same year ; and even in a factory with a 
nearly constant ’^personnel, the weekly amounts paid to indi- 
kViduals are not in general so tabulated as to be easily summed, 
and the working out of the totals would require a prohibitive 
alnount of clerical labour. If we turn to the workman, on the 
other hand, we "shall find in the majority of cases that no 
accurate account has been kept of earnings through the year, 
and it would only be by careful individual examination, im- 
practicable on any large scale, that an estimate could be made; 
in many cases the men, even if willing, would be quite unable 
to give a connected account of their earnings during the past 
twelve .months. 

It seems clear that we must adopt a smaller unit, and since 
most wages are paid weekly, a week is the most natural one. 
The subsidiary questions which will lead best to an estimate 
of annual earnings will be discussed below. The answer to 
the first question, as to the best quantity to investigate, is 
indirect; the only individual measurements we can obtain 
directly are the week’s wages, but these may be supplemented 
by estimates en masse. 

( Next, who possess the information we require ? Clearly 
both employers and employed, and in an ideal census the 
answers would be obtained from both groups; Employer. »«! 
but considerations of simplicity, cheapness, and «npi»y«i» 
accuracy are all in favour of applying to em- or ”“ n ' 
ployers alone. 

If employes were to be interrogated the procedure would be 
as follows. Draw up a form on the analogy of the census form, 
describe very briefly the purpose of inquiry, add a short series 
of concise, lucid, simple questions in suitable type and with 
careful spacing, such as will lead to the minimum information 
required; let these forms be left to be called for, and when 
collected, let the tellers have time and opportunity to examine 
and correct them. It is clear that this method would entail an 
even more expensive organisation than the population census, 
and as the result of experiment it may be doubted whether the 
maximum of accurate information that could be thus obtained 
would come up to the minimum that > would be of "use. A 
partial inquiry can, however, be carried out by means of trade 
unions, as was the case in the census of Railway Wages under- 
taken by the A.S.R.S. in 1908. 
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The ijiethod of inquiry among employers was as # fiollows : 
Suitable blank forms and an explanatory letter werfi sent by 
post to all employers, whose addresses could be found, in the'' 
industries selected for investigation, and the answers were 
returned to the central office by post. This is far simpler and 
cheaper than the suggested scheme fcfr inquiry among work- 
men, requiring far fewer forms and only a small staff of clerks. 
With business men it is a simpler matter to post the retujn 
when completed than to keep it for collection by hand. Since 
there is no personal intercourse over the matter it is especially 
necessary that the questions should be lucid, for the additional 
correspondence necessary to rectify errors is a source o l worry 
at both ends. A copy of one of these forms used in 1886, 
abridged only in the number of subdivisions, is subjoined here 
and on the following page. 

WAGE CENSUS. 

Return of the Rates of Wages Paid in Silk Manufactures. 

Name of Factory or Firm 

1 

Address 


Note . — It is requested that the salaries of clerks and managers may be excluded. 
The return is of wages of working men only. 


Numbers employed on 1886 

Amount paid in Wages in the year 1885 
Highest weekly amount paid in 1885 
Number of Hands paid in that week 
Lowest weekly amount paid in 1885 
Number of Hands paid in that week 


- No 1 . 

- £ 

Date 

- No 

jC 1 kite — . 

- No 


State the present average rate of pay for overtime : that is, whether 
overtime is reckoned as time and a quarter or time and 
* * a half, &c., or in what way reckoned 




State whether overtime is at present being worked, and how much ; 
or whether less than full time, and how much less 



DEFINITION OF UNIT 


Current Rates of Wages and Hours of Labour per 
Week of Rersons employed in each Branch of the Silk 
Manufactures on 1886. 


Description of 
Occupation. 


Current Rates of Wages Paid and Number of 
Hours of Labour per Week when in full work, 
but exclusive of Overtime. 

Note . — State the Number of Hours of Labour per Week, 
whether the Workers were paid by Time or Piece- 
work, and if paid by Piece-work give the amount 
earned in a week, exclusive of Overtime. 


N.B . — It is requested 
that this list of occu- 
pations may be re- 
vised where necessary 


Silk Throwing- 

Parteri 
Winders - 
Cleaners 
Spinners 

Doublers 

Ac. 

C» 

'ilk Spinning — 

, Openers and j 
Sorters 

Boilfcrs 

» Dresners 

Preparers and 
Carders - 
&c. 

ilk Weaving — 
Winders 

Warpers 

Warp Pickers 
or Clearers 

Doublers 


Filers 


Ac. 


r Time 
Piece 
'Time 
Piece 
' Time 
Piece 
’ Time 
Piece 
* Time 
Piece 


Time 

Piece 

Time 

Piece 

Time 

Piece 

Time 

Piece 


Time 

Piece 

Time 

Piece 

Time 

Piece 

Time 

Piece 

Time 

Piece 


MALES. 


Mkn. 


-JU C mV 

If *1 


O u 

ii 


Lads & Boys. 


11 


8 S> 

c 2 £ 


1 

sSJ 


FEMALES. 


Women. 

x8 years and 
upwards. 


JS 

K 

S 5 8 
W 




§1 


Girls. 

Under 18 years. 


£ 

3 £■ 
Z. E 
W 




2 5 
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The measurement of the annual earnings of groups of 
workpeople was one of the ultimate objects of the inquiry. 

Annual earnings are composed of many different' 
Aonuti earning*. . tems ^ 0 £ w j 1 i c h the following are the most impor- 
tant : Ordinary weekly wages, pay for overtime, special pay- 
ment for special work (e.g., of builders if sent to a distance), 
or at special seasons (such as the harvest) ; and payments not 
in cash, such as free or reduced house-rent, free or cheap coal, 
and special goods at cheap or wholesale prices (such as cloth 
in textile factories, or potatoes for agricultural labourers). 

When payment in kind is at all general or important, it is 
generally better to proceed on a different method entirely, e.g., 
that followed by the Agricultural Sub-Commissioners of the 
Labour Commission. When it consists of only one simple 
item, such as a house rent-free, it can form the subject of an 
additional question on a form similar to that on p. 32. In the 
silk industry this does not occur ; but this discussion shows the 
necessity of preliminary knowledge on the part of the investi- 
gator before the right form of inquiry can be drawn up. . 

We have left for consideration the weekly wage, and over- 
time and special payments, the last two of which can be grouped 
together. The ordinary weekly wage is a sufficiently general 
and definable quantity in most subdivisions of most industries. 
A foreman could generally state how much is earned in an 
ordinary full week for each of the hands under him. In many 
cases there is an hourly or weekly sum regulated by a trade 
union, as in the building trades. In others, as in the cotton 
industry, piece-rates are so regulated as to bring out a definite 
sum for the week’s work graduated in relation to the difficulty 
of the task ; in general, a very rapid survey of the waga book 
will show what the worker in each subdivision will mak^ on an 
average. Thus the average weekly wage in an ordinary full 
week can be found with considerable accuracy, but this takes 
us only part of the way in the calculation of annual earnings ; 
we need to know in addition to this how many full weeks are 
made in the year. '-It is the method by which this is attempted 
on.the % printed form that is open to most criticism. The ques- 
tions used are on p‘ 32, and afford a good example of the 
general difference between the quaesita and the data which are 
attainable. The quaesitum is : To how many full weeks’ wage 
are the annual earnings equivalent allowing for slack jveeks 
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and overtime ? JThe first crucial question to decide i»: Are we 
to allow tor an average loss of time, say two weeks in the year, 
•through sickness, or are we to allow only for time QuMittind 
lost through failure of work? Since sickness i3 d * t *- 
an individual nqt a general misfortune, it will be better to 
exclude it if possible. Now overtime in one season, especially 
if its wages are on “ time-and-a-quarter ” or " time-and-a- 
half ” basis, very quickly tends to balance slack time at another 
season, though it may be supposed that it is rarely the case 
that more than the normal week’s wage is averaged through 
the year. Thus it will be logical as well as simple to estimate 
the year’s earnings as so many normal weeks’ wages. For 
example, if we found that two weeks were lost through sick- 
ness and three through the mill stopping, and that overtime 
in one busy month had added wages equivalent to two normal 
weeks, we should have forty-nine weeks’ full wage. The 
figures which will give this result will be the total sum paid in 
wages in the factory in the year divided by the aggregate 
normal week’s wage of the people dependent on the factory, 
supposed all at work. Thus, if 1200 hands (men, women, and 
children) would, if all at work, make £1000 in a normal week, 
and this was the average number dependent on the particular 
mill, and if £48,000 was paid in the year in wages, annual earn- 
ings would be equivalent to forty-eight normal weeks, and 
earnings would average £40. Now the total paid in wages is 
generally kept separate in business accounts, but the number 
dependant on the mill for work is often not known accurately ; 
for the personnel of a large establishment is subject to continual 
change, and the manager would not know whether ar person who 
left Mfent to another mill or got no work. The total number 
of all who had worked there during the year would be too great 
for this purpose, and the number at work in a normal week 
too small. The number open, perhaps, to least objection is 
the number at work in the busiest week of the year ; for those 
absent except through sickness when trade is busy cannot be 
■ said tfl be dependent on the factory, but ff not at work else- 
where are among the permanent unemployed ; very fev^ work- 
people indeed will be taking their holiday at a busy time, and 
it may reasonably be supposed that all the factories in the 
same industry will have their busy and slack seasons at nearly 
the %gune time. The answers then to the printed questions — 
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Total paid in year, and number of hands in bpsiest week — tell 
us all we need to know, if we may make this »assumpUon ; for 
then the total sum paid as wages in the year, divided by the' 
maximum number employed in the busiest week, gives the 
average annual earnings. To find the equivalent number of 
normal weeks, multiply the maximum number employed by 
the average wage found on the second page of the form, so that 
the product shows the aggregate weekly wage if all were 
employed, and divide the total paid in the year by this product. 

The process may be illustrated by comparing the data 
obtained in the more recent census with that in 1906, when 
more information was obtained. • 

In 1906 some of the particulars obtained were as follows. 
The Cotton Industry of the United Kingdom is taken as an 

A example and the figures relate only to those 

•ndwNUj firms which made returns. [See Cd. 4545, pp. 
’"***■ xxv-xxxvii, 3, 17, 20-28, and (for the blank 
schedule issued) 242-4.] 

T = Total wages in 1906 = £10,195,229. 

W = Average of 12 weekly statements * of aggregate wages 
= £204,173. * 

N = Average of 12 weekly statements of aggregate numbers 
= 212,503. 

M as Greatest aggregate recorded among N = 213,472. 

At = Average earnings of all employed in particular week 
= I 9 ’ 43 s. 

A / =s Average earnings of those employed in particular week 
who worked neither overtime nor broken time =? 
19s. 7 d. 

O 

Hence we have 

w c 

A = average earnings in the 12 selected weeks = 19*215. 
T 

E« = average annual earnings of the average number 
= £47*98. *• 

w^as S? = number of weeks’ average earnings obtained in 

A 

the year -= 49 95. 

The difference between 52 and n v i.e., 2 05 weeks, is attribut- 
• t The last ordinary week in each month. • c? 
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able to*holidays^ which range from 8 to 15 working Allays, but 
includes also stoppages of the factories from any cause. 

T ' 

Em = jj-= £47-76 = average annual earnings of the maxi- 
mum number, w|rich is taken as the number dependent on the 
factories, the variation* being due to unemployment. 

n, = —”* = 49-73 = number of weeks’ average earnings of 
» A 
this maximum. 

«! — «* = 0-21 = possible estimate of weeks lost by un- 
employment in 1906. 

To this should be added an estimate for the number unem- 
ployed in the maximum week. 

At 

= roo8. In this case broken time exceeds overtime 


on the whole, so that earnings are 0 8 per cent, below those 
obtained simply by full-time work. 

Applying this percentage to « 2 , we obtain 49-34 as the num- 
ber of weeks in the year in which full-time earnings could be 
obtained by the maximum number recorded as employed. 

•In 1886 the corresponding totals recorded were T = 
£3,148,566 for the year 1885. A/, ordinary wages in a normal 

week in 1886 = 15-25. M, the greatest number recorded in 

T 

1885 = 87,887. Hence E* = = £35-8, average annual earn- 


ings of maximum number; and if we can take A / as the 
same a» A (the average weekly earnings in the year), « t = 
Em 4- A/ = 47-1, the number of weeks’ earnings of this 
rftaximum. Here we cannot compare A/ and A, for want of 
data* 

The method is evidently open to criticism from several 
points of view, and is here given rather to illustrate the nature 
of the problem and of the data which may help to solve it, 
than as a complete statement of the relation of normal wages 
to annual earnings. 

In*addition to lost time due to holidays and to complete 
unemployment in the maximum week, ^here is lost tipie'tlue 
to sickness, of which an estimate * is an average of 1-7 weeks. 

In the corresponding French wage census, of which the 


• See Division of the Product of Industry , 1919, by the present author, 
p. 3Q^md I}r. Snow in the Statistical Journal , 1912-13, p .*477. 
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results were published in 1898,* an estimate ?f the number of 
days’ work obtained in the year is formed onea different basis. 
The data colfected were — (1) The variation each month of the'' 
personnel in each industry, which is found to average 4 per ceijt. 

Th* French for the year — that is, for each joo employed, 96 

n "* ho4 - are found who have been in the same establish- 
ment for as much as twelve months : (2) The differences 
between the maximum and minimum numbers employed an 
each establishment month by month during the course of a 
year, which are found to average 19 per cent, of the ( ? average) 
personnel. From this we may perhaps draw the conclusion 
hat, on an average, half this number, at least, are in* general 
out of work : (3) The number of different persons who have 
been employed in each establishment at one time or other in 
the year; this is found to be 140 for each 100 permanently 
employed, from which the legitimate conclusion is that the 
average number of unemployed is not so much as 40 in 140, 
i.e., 28 per cent. These two percentages, 9 per cent, and 28 
per cent., are taken to be the inferior and superior limits of 
average lack of work. This information is more detailed and 
perhaps more reliable than that on which the method, used 
above for the English figures, is based. Data obtained from 
syndicates of French workmen indicate about 20 per cent, as 
the average want of work; the English figures obtained by 
the method described above from the whole wage census yield 
about 12 per cent, in 1886. 

This somewhat lengthy discussion on the few Questions 
included on the first page of the form is a good illustration of 
the necessity of considerable preliminary study before a blank 
form can properly be drawn up. Space does not aKow a 
detailed criticism of the rest of the form, but it should J>e 
mentioned that the questions relating to individual wages in 
1886 were not sufficiently detailed. Thus under " Spinners, 
piece " (see schedule, p. 33) in each factory the earnings given 
would be an average for all employed, so that the earnings of 
individuals were dot recorded, and the general distribution 
o^varpings could only be given approximately. In 1906 the 
instruction was “ Those earning the same amount may be 
grouped together; otherwise each ertfry should represent only 


• §alair$s st Dur 4 $ dn Travail , 1897, PP* < 5 # xfl. • 4 
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one penspn,” and the actual variation in each occupation and 
industry * could %e shown. 

A careful comparison of the two schedules is Recommended, 
for it will throw light on many of the difficulties experienced 
in preparing questionnaires. 

Section 3. — Example of an Unofficial Investigation. 

* Investigations without official authority do not differ 
essentially from those conducted by authority if (as in the 
Wage Census) there is no compulsion to answer ; but they are 
general^ more limited in their scope, for want of organisation 
or funds, and are at the same time freer to employ the method 
of samples (which is discussed below, Part II, Chapter II.) and 
induced to do so in order to cover an adequate field. 

As an example, we may take the investigations relating to 
the economic condition of the working-classes in certain towns, 
whose results are published in Livelihood and working-da*. 
Poverty .* The problem to be considered was not condition*, 
precisely defined beforehand; in brief, the intention was to 
obtain what information it should be found practicable to get, 
as to the number of earners and dependents in working-class 
families, their earnings and their needs, and to tabulate those 
parts of it which after criticism were believed to rest on trust- 
worthy answers. 

It is generally the case in such investigations that it is 
necessary to obtain the information personally, since people 
are not willing to fill in and return questionnaires unless there 
is some strong inducement (e.g., obtaining sugar) to do so. 
Consequently the forms used need contain* few instructions, 
the investigators being specially selected and prepared for the 
work. It was found advantageous to use cards rather than 
paper schedules, and a facsimile is given on p. 40. 

It had, as always, to be considered what facts were actually 
known by the householder or his wife, and what were likely 
to bew communicated to a tactful and persistent 
inquirer Once the wife is engaged in conversa- ^ 

tion, there is no difficulty in eliciting ^formation as*to the 
inhabitants of the household, the age of those under twenty, 
and the occupations (afid generally the employers) of those 


^ Publ^hed for the Ratan Tata Foundation. G. Bell & Sons, 1915. 
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at work ; ithe rent and type of house can also be easily jscorded 
(and checked if necessary). This information by itself led to 
valuable tables showing the constitution and earning strength' 

of the families, and the enormous variation and absence of 

< 



any standard type ifl these, of a kind that have not been 
compiled in the census or any other official investigation. 
The difficulties were found in assessing family income. 
,*The wife often does not know the husband’s or Jthe glder 
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childrens earnings, and information is not readily* given in 
very many casej. Where it was given, it could be verified in 
selected cases by inquiry from the employers, add where tests 
were made it was found that there was no bias in the direction 
of either overstatement or understatement. If, as was generally 
the case, the occupatio’n was correctly stated, it was possible 
to estimate with fair accuracy the normal week's wage from 
t|je known standard in the town. The distinction on the 
card between “ last week's " and " full time " earnings was 
made because the first was capable of a definite answer, and 
the second was often an estimate. The investigator, having 
both statements, would be able to find the reason for any 
difference, and to establish the second (which was the only 
one used in tabulation) more definitely than if it stood alone. 
It was found necessary to tabulate only the conditions which 
would exist if a full week’s work were done, and to leave 
aside questions of sickness and unemployment. The answers 
to the question as to “ other sources of income " were certain 
to be imperfect; but, so far as they went, they showed the 
means of livelihood of some families whose wages were evidently 
insufficient, and since they erred only by omission they gave 
some positive information. The majority of working-class 
families have only a negligible amount of income-yielding 
property, and the main source of such income, the ownership 
of the house inhabited or of other houses, was generally reported. 

The estimates of earnings were not believed to be sufficiently 
accurate to lead to a table showing the numbers with various 
annual incomes, but they were adequate for the main 
purpose for which they were used. This purpose 
was#to find out what proportion of the families 
had an income (apart from charity) to bring 
them above a certain standard, such as Mr. Rowntree's 
minimum standard as calculated by him in Poverty. In the 
great majority of cases there was no doubt from the constitu- 
tion by age and sex of and the number of dependents in the 
family and the nature of the man's work /on which side of the 
line the household stood. In the doubtful cases (whifkaHfere 
kept apart in the tables) advantage was taken of all the points 
noted by the investigator (including non-numerical statements 
written on the back of the card which was reserved for this^ 
puqyose)*and a reasonable judgment could generally be made. 
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The card was not shown to the informant, but wsas filled 
in immediately after the conversation. Thejidentity of the 
household wab only preserved by the inquiry number. A 
file number was written in to preserve an order after the 
first process of sorting. Each card was criticised, and the 
numbers needed for tabulation were computed, and these and 
abbreviations showing the constitution of the family (such 
as m., s. : w., sc., sc., in., where a man and his son were earning 
and his wife, two school-children, and an infant were dependent) 
were written in the small spaces under the words " File No.” 
The entries for the tables were then obtained by dealing the 
cards into appropriate packs and then counting them; this 
process is rapid, but needs continual careful verification. 

The scope of each inquiry (eg., the working-class of 
Northampton) needed careful definition. It had to be decided 
whether the town from an economic point of view 
coincided with the administrative borough, and, 
if not, outlying houses must have been definitely included or 
excluded. Next it was necessary to get an accurate list of 
all the houses in the district and apply the method of selection 
by sample to this list ; the inquiry actually dealt with whatever 
was contained in the list used, and the list gives the definition 
of its scope. There is no accepted definition of the “ working- 
class,” and that actually used was in fact determined during 
the handling of the cards. As a preliminary, all the houses 
at first selected which were above a certain rental or whose 
tenants were contained in a directory of principal residents 
were excluded. Of those visited all were excluded in which 
the principal earner was a clerk, teacher, or manager. Fof 
others, such as shop assistants, commission agents, publicans, 
small shopkeepers, decisions had to be made and recorded as 
the various cases arose. The final definition of the working- 
class households, as understood in the inquiry, was then by 
delimitation, and if given in full would be somewhat as follows : 
all households where the rent was less than 12s. weekly, in which 
the principal earner nyas not a clerk, teacher, etc. etc. Stich a 
proe^5S^of forming the definition during tabulation is of necessity 
quite common; the decisions should be quite clearly shown 
in the report, and emphasis should there be laid on the treat- 
ment of marginal cases. 
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Section 4. — Statistics of England’s Foreign *Trade. 

The original schedules which lead to many other statistics 
ye interesting, but limits of space must restrict us to one 
more typical inquiry, that which leads to our statistics of 
foreign trade. 

In the population census the filling in of the form is com- 
pulsory and done by the householder; in the wage census 
the answers were voluntary and given once and for all by the 
employer ; in the various inquiries undertaken by the Labour 
Department the answers are voluntary, but in many cases 
periodic, so as to become quasi-official. The method of collec- 
tion of import and export statistics is a blend of all these. 
There are three classes of persons who know the Tttln forrotn(t 
facts in question — the sender of the goods, the * “ ° n, “ n 
custom-house official through whose hands they pass, and the 
recipient or his agent. Circumstances decide that, in the case 
of exports from the United Kingdom, the exporter or his agent 
sends an account of the quantity and value and place of destina- 
tion, etc., of goods despatched to the Statistical Department 
oi Customs ; that, in the case of imports, the receiving-agent 
hands over an account of goods to be landed to the custom- 
house officials, who verify the account, roughly if the goods 
are duty free, carefully if they are liable to duty ; and that, 
in the case of transhipment, the goods are treated in the same 
way as imports at the port of landing, and to some extent 
verified at the port of embarkation. 

The blank forms, being verified by officials as part of their 
‘duty, or having been filled in by agents thoroughly used to the 
tasl#, need no covering letter, and may be made as complicated 
# as necessary ; no questions are inserted but only blank tables. 
An examination of the forms in use will show what are included 
as exports and imports in the Board of Trade totals, and what 
is the total amount of information available for tabulation.* 

The quantities we wish to measure in this investigation are : 
the Volume or weight and .value of all^goods which have an 
exchange value, which leave our shor^ or reach nwymin 
them from without, subdivided as regards classes • od d * t *- 
of commodities and countries of destination or origin; the 

• The following paragraphs do not take cognisance of any changes that 
mqg havewtaken place since 1914. 
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values being those at the times of loading or unloading. The 
quantities we can measure are sharply distinct from these, 
being the records of values and volumes which reach the 
Board of Trade. We should therefore examine the forms to 

c 

decide — (i) What part of imports and exports are recorded; 
(2) whether the values are correctly given, (3) the quantities 
accurately registered, (4) the commodities accurately defined, 
(5) the countries of origin and destination accurately dis- 
tinguished in the returns. 

On reaching port the ship’s master has to send in an 
Example* <x account, of which an abridged specimen is given 

information. Qn p 45 . * 

The goods for quick transit are passed at once, and a special 
form is sent to the Customs Establishment similar in character 

Dutiable good. *° on P* 4^- The remaining goods are treated 
u * * * ' either as dutiable or as duty-free articles. In the 

list before us, ten cases of wine are entered for home use, and 
an account is sent into the Statistical Office ; sixty cases are 
warehoused and another account (as to quality, quantity, and 
value) is sent in ; the whole are registered as imports. Twenty 
of the warehoused cases are removed to another port and 
re-exported ; an account is sent, and they are entered as exports 
of foreign goods. Twenty are put on board ship as stores at 
the port of entry, and ten more removed to another port for 
the same purpose, and of this the central office receives an 
account ; the remainder are removed to another warehouse, 
still in bond, and on leaving that will be treated in ontf of the 
four ways just mentioned. Other dutiable articles are treated 
in the same way. 

Goods not sufficiently described ^r not answering to their 
description are opened, their contents entered on a " bill of 4 
Examination o< sight,” and an account sent in. Private effects 
t® 0 *- are separately examined, being described on a 
“ sufferance ” form ; if they are bona-fide personal goods no 
record is kept of them, except in the case of dutiable goods, 
which are treated as ordinary impprts. If the dutiable goods 
are^oacealed, either ^mong private effects or merchandise, 
and forfeited, they are not reckoned as imports. 

Bullion is entered on a separate form and kept distinct 
throughout the accounts. 

/ The duty-free goods, if for transhipment at another pert. 
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are senb there under seal, and barely examined; »they are 
treated at the central office in the same way as 
dutiable transfer goods. The remaining free goodi, Fre * 


If Sailing Vessel 

or SteamcJ . STEAMER. 

No. I. REPORT No. 980 * 

Port of X. 


Official No. 

No. of Register, 
Date of Registry, 


Ship’s Name. 

Tonnage. 

British or Foreign. 

If British, Port of 
Registry ; if Foreign, 
Country to which the 
belongs. 

Number of Crew. 

Name of Master, 
and whether a 
British or Foreign 
Subject. 

Port or 
Place from 
which 
arrived. 

British 

Seamen. 

Foreign 

Seamen. 

1 

Marianne. 

700 


12 

B 

H. Hind. 

Havre, 

France. 

| 

1 


Cargo. 


X. 

Name or 
Names of 
Places where 
laden in order 
of time. 

a. 

Marks 

3 - 

Not. 

4 - 

Package* and Description 
of Goods, Particulars of 
Goods stowed loose, and 
General Denomination of 
Contents of each Package 
of Tobacco, Cigars, or 
SoufT intended to be 
imported at this Port. 

5 * 

Particulars of 
Packages and 
Goods (if any) for 
any other Port in 
the United 
Kingdom. 

6 . 

Goods (if any) to 
be Transhipped 
or to remain on 
Board for 
Exportation. 

7 . 

Name of 
Consignee. 

Ilsfvre, 

Pari 

s to 

London.— 600 pkgs. 

Fruit and Peris 

hables. 

Smith. 

France. 



68 pkgs. Merchan 

dise. 




COK 

1392 






AE 

495/6 






KG 

34% 






FOT 

1/50 






AJ 

3/6 






CK 

1 






AC 

IO 

) 




» 

KL 

40 

y 70 cases Wine. 



99 

If any wreck 

ACD 

20 

J 




fallen in with 

WD 

166 

I 5 cases Woollens in 

transit to Liver 

pooL 

>» 

•or picked up, 
to be stated. 

O&D 

LL 

1 case Brandy. 



>> 


Stores. 


Surplus Stores remaining on board, viz. { | J£ Tobacco. 

Number of Alien Passengers (if any) - Nil. 

Pilot’s Names 

At what Station Ship lying - - - South Quay. 

Agent’s Name and Address - - - C. J. C. * 

I declare that the above is a just report of my Ship and of her Lading, and that 
the Pat ticulars therein inserted are true to the best of acknowledge, and that I have 

my said /hip since her departure from 

(SigAed) H. HIND,-i 05 «i!k 
Signed and declared this 13th day of October 1890 
In pr^ence of 

(Countersigned) • 

pro- Collector. 


not broken Bulk or delivered anv Goods out of 
Havre, the last Foreign Place of Loading. 


980th ship at X. tinea ztt January. 
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which in (general form the bulk of the cargo* are entered on 
such a form as follows, which is worth noticejfor it is a speci- 
men of the rohgh material from which our foreign trade figures 
are evaluated. 

ENTRY FOR FREE QOODSe* 


This ip«M 
is for th« 
DM of tllC 

Officers of 
Customs. 


Port_l 

Dock or Station _ 
Importer’s Name.. 


Kxamioa- Ship’s Name. Master’s Name Rotation No. Date of Report Port or Place whence 
Uon * Marianne. H. Hind. 980. 13/10/96. Havre, prance. 


Marks and 
Nos. 


No. of Packages and Description of Goods, 
in accordance with the Official Import List. 


COK 1392 

AF 495/6 
KG 340/9 


FOT 1/10 
ii /5 

„ 16/20 


.. a i /5 
* 6 / 3 ° 
.. 3 i /5 


One Goods M&nuf. N.O.E. Billiard 
Cue Tips .... 
Two Leather Shoes - 
Ten Cotton Manuf. Trimmings * 
Embroideries 

Piece Goods, not Muslins - 
Ten Gloves of Leather 
Five Silk Broad Studs • • 

Five Works of Art — 

Plaster Casts ... 
Statuary - 
Pictures by Hand 
Five Books Bound - 
Five Bronze Manuf. Ornaments - 
Five Metal Manuf. Ornamental 
Brass-headed Nails 
Five Silk Mwiuf. Dresses, Mantles, 
Trimmings - - - • 

Ten Goods Manufl N.O.E. — 
Fancy Goods ... 
Horseless Carriage 
Brushes .... 
Glue - - • * • 

Billiard Chalk ... 
Hardware - 
Four Stationery Ink - 
One Iron and Steel Manuf. 
Machinery, British, returned 


10 dot. prs. 58 

::: X 

300 yds. 8 

11,340 do*, pr. 12,316 
10,400 


' I enter the above goods as free of duty, and declare 
the above particulars to be true. 

Dated uiis 13th day of October 1896. 

** 1 (Signed) J. Jones, 

Importer or his Agent. 

• In 1904 this form was altered so as to distinguish between "place of 
, shipment of goods," which phrase replaced " whence " in the last heading, 
and “ place whence goods consigned 6 which is now the heading of an ad- 
ditional column. * 
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The* information so received is usually accepted at the 
central olfice without inquiry. It frequently happens, however, 
that the form is not properly filled in by the ageht, the values 
often being omitted. When this is so, it is the v«mc»aon o< 
cfuty of the clerk at the port of entry to require <uu - 
the agent to complete' the forms, if imperfect, and to test 
the values by current price lists with which he is provided. 
When there is a palpable error or omission in the form, or when 
the price appears out of the common, a query is sent from the 
central office to the port : e.g., with reference to such a form as 
that just given, the following correspondence might arise : — 


1. Pictures by hand, £10,200. Explain high value. 
Answer. — Correct; invoice was seen ; pictures by Millet. 

2. Books bound : is weight or value incorrect ? Answer.— 
Both correct ; advice seen ; old and valuable books. 

3. Goods entered as “ goods manufactured, chip plaiting ” : 
explain nature, and state if description is correct. Answer . — 
Correct ; wood shaving plaited and occasionally mingled with 
horse-hair, etc. 

4. Potatoes, 40 cwt., £62. Weight or value? Answer . — 
Value correct. Weight should be 400 cwt. 


Thus any unusual entries are liable to be checked and 
verified. 

In the case of goods not easily valued, or of miscellaneous 
goods not easily tabulated, errors must arise in this way ; and 
another error may enter if an agent or clerk, who Po-iwutr 
does not wish to receive too many queries from *“"*• 
headquarters, enters at ordinary rates goods of exceptional 
value ; but when staple commodities and large quantities are 
involved, all the persons concerned will be familiar with the 
torms they have to fill, the prices will be known, and so in im- 
portant cases errors will be at a minimum. The import total 
values, therefore, are the sum of many quantities of various 
degrees of accuracy, and it is not difficult when looking through 
the list of items in the annual report to see which are specially 
liable to error. Such commodities as /old books, works, of 
art, goods where sale depends on the fluctuations of fashion, 
racehorses, and so on, have values .varying from day to day, 
and their exact value In the balance of imports and exports 
ca^pot be determined. ' 
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In the case of goods consigned for sale, a class which in- 
cludes the great part of the imports of woolfl no value can be 
named by thb agent. The goods are then valued at current 
market prices, and in the case of wool at the prices realise f d 
at the next wool-sales. There is always a pgssibility of error 
here, since the current prices may not be exactly obtained for 
a particular consignment ; and there is apparently permanent 
overvaluation of wool, since the price at the sales is presum- 
ably the price of wool landed and warehoused, while the value 
for import records should exclude the cost of unloading and 


moving. 

The quantities and values of exported goods are filled in 
by the shipper or agent, and the papers sent through the 
Custom House officials or directly to the central 
office within six days of the ship's clearing. The 
specification given on p. 49 is an abridgment of the form used : — 

The forms for British and Irish goods are distinct from 
those for foreign, free and duty-paid, goods; and there are 
distinct export forms for transhipments, which have already 
been registered as imports. In these cases the specification 
and quantities are likely to be correct, but there are causes 
which may falsify the values. If they are to be subject to an 
ad valorem duty, they may be undervalued ; if they are adulter- 
ated goods, masquerading as genuine, they may be over- 
valued. It seems hardly possible to estimate these errors. 

We are now in a position to define imports and exports 
according to their meaning in the Board of Trade Returns; 


Definition o! 
official imports 
and exports. 


as, for instance, when for 1913 the value of 
imports is stated as £769,000,000, and of exports 
as £635,000,000, of which £110,000,000 art? re- 


exports of foreign or colonial goods. In the following state- 
ment the details already shown are supplemented from the 


definitions given in recent years in the introduction to the 
Annual Statement of the Trade of the United Kingdom. 


Under imports are included all goods landed through the 
custom-houses, including goods immediately shipped as stores 
or rcd'yned from customers unused, with the following excep- 
tions : (a) fish of British taking landed in British ships arriving 
direct from the fishing grounds, goo<Js directly imported by 
ambassadors and ministers accredited to this kingdom, old 
vessels bought from foreigners; and (b) sacks, cases, e*c., 



♦SPECJFICATJON FOR BRITISH AND IRISH GOODS ONLY.t 
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t A column headed " Final Destination of Goods M has been added sh 
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used as packages, passengers’ luggage, ships' stores/ ballast, 
and military and naval stores on board Government vessels, 
goods transhipped under bond, and goods in transit through 
the country on a through biU of lading (of which separate 
accounts are given), and goods unlanded and- so reported. 

Under exports are included all goods entered on ships’ 
bills of lading, excluding the classes after (6) in the previous 
paragraph; new ships, leaving our shores sold to foreigners 
are included since 1899. 

Goods immediately reshipped at the same or another port, 
or held in bond and then reshipped, axe included in imports, 
and in exports are distinguished as Exports of ForAgn and 
Colonial Produce. 

Bullion and coin are not included in the general totals of 
imports or of exports, but are recorded in separate tables. 
Coin carried privately and the great part of diamonds imported 
or exported (a quite important item) are not recorded. 

The treatment of coal throws light on these paragraphs. 
Coal taken for use on the voyage is registered, but not included 
among exports ; coal as cargo is included. 

The value of imports reckoned is the nominal exchange 
value just before they are landed, and so includes all payments 
due to foreigners, shippers, underwriters, etc., 
* and shipping dues, and none to stevedores, dock- 
labourers, etc. The value of exports is the value " free on 
board.” The exact definition of the values, here and in other 
countries, is of primary importance in studying the balance of 
trade.* 

Great difficulty is experienced in classifying exports accord- 
ing to their countries of destination and imports according to 
their countries of origin ; the details first asked for in 1904 
(see notes on pp. 46 and 49) have led to greater accuracy and 
definiteness on these questions. In the accounts of trade 
there have been since 1904 two sets of tables, and the newer 
ones relating to countries of consignment are now given the 
greater importance.! 

--^.y great care is\necessary in using the accounts of foreign 


* See the Reports of the Committee of the British Association on The 
Accuracy . . . of . , . Statistics of International Trade , 1904 and 1905. 

t See Committee on Trades Records (Cd. 4346), and compare a current 
Statistical Abstract <fj the United Kingdom with those issued circa 1920 and ^^03. 
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trade during the fcrar period. The class named above ** military 

and naval store! on board Government vessels " excluded 

from the accounts assumed vast dimensions. 

A very good example of an official inquiry is to be found 
in the Census of Production (1907) of which the results were 
published in 1912 (Cd. 6320).* Special attention may be 
directed to the relation between the quasitum, the ultimate 
object of the inquiry, the data which it proved to be possible 
to collect, and the adjustment of the questions so that the 
answers could readily and accurately be given by the employers 
in various industries. 

* Examples of the Blank Schedules used can be seen at the School of 

Economics. 




CHAPTER IV. 

TABULATION. 

\ 

Leaving now the consideration of blank forms of inquiry, 
let us turn to the methods by which our data, accumulated on 
these forms, can be tabulated. At first sight the tabulation 
of so many million census forms, so many schedules of wages, 
and so many lists of goods imported, seems mere office work, 
to be done mechanically, only requiring accuracy and not 
subject to scientific analysis. Tabulation does, indeed, involve 
a great deal of automatic labour; but the determination of 
the exact form of the table and the choice of the headings to 
which the totals shall correspond task the administrative 
statistician, and are worth the closest study. 

The function of tabulation in the general scheme of a 
statistical investigation is sufficiently definite ; it is to arrange 
tim function ot in easily accessible form the answers to those 

tabulation, questions with which the investigation* is con- 
cerned. If it is required to know, for instance, the number of 
persons of each sex and age-group in all the districts of the 
country, the figures in the table must show these numbers. 
Or, to take a less definite problem, we want all the information 
possible as to annual earnings. In studying the forms issued 
for the Wage Census, we have seen that the information which 
can be obtained is not precisely that which we require. The 
problem then is so to tabulate our information that our totals 
may give answers as near to our requirements as possible, 
aad*& can easily beVound by experiment that the way to do 
this is by no means obvious. 

Not only must the figures be grouped so as to answer the 
questions put forward in the original scheme, but if the 
information is t of wide and varied interest, as in all* the if>ves- 
‘ 5 * 
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tigation 6 <so far considered, the data must be studied from many 
points of view, arid tabulated so that students in all branches 
of knowledge may be able to extract from our tables the infor- 
mation they require. Thus the population census is used by 
tlie financier, the # legislator, the merchant, and the commercial 
traveller; political economists turn to it for light on the 
development of industry, and on the change of numbers in 
eaph trade; those interested in social questions will study 
the ages and sex-distribution in various districts or occupa- 
tions; the sociologist and biologist will need accurate infor- 
mation as to the growth of population and the change of age 
distribution. 

To take more specific points, the blue-book which contains 
the tabulation of foreign trade statistics will be expected to 
show how our trade with each country is developing, whether 
we are holding or improving our position in certain markets; 
whether we are exhausting our supply of raw materials; 
whether some new commodity is yet of importance. It must 
be remembered that the original material is not accessible to 
the public, that they are dependent on the information ex- 
tracted for them, and that, though it would be possible to 
turn through all the forms for special data, yet the labour 
needed would be prohibitive, while a little more detail in 
the tabulation might easily have isolated the information 
needed. 

The method of tabulation should be taken in relation to 
the conception of characteristics explained above (p. 20). 
Each person or thing in a group possesses certain Tabulation and 
adequately defined characteristics, say A, B, C, ch " lcterl>Uc *- 
and \). They also possess one or other of the charac- 
teristics E 1( E 2 , E s . . ., and one or other of F 1( F s , 
F$ . . ., etc. A table in single tabulation shows separately 
the totals under each characteristic, E,, E 2 , etc. The heading 
of the table gives directly or by reference the definitions of 
A, B, C, and D, and contains frequently some such phrase as 
• “ in each locality ” if the E characteristic is a locality. Each 
line in the first column then defines an E. / A double tabuhjri^ 
shows the classification both by E and by F, the heading of 
each column defining an F, so that an entry shows the number 
of persons who possess, say, the characteristics A, B, C, D, 
Ej, £nd F,. The horizontal totals show the totals who have 
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\ * 

characteristics E„ E„ etc., and the totals of She columns relate 
similarly to F„ F 2 , etc. I * 

For convenience, the methods of tabulation may be divided 
into three groups : A. The simple statement of totals of 
Tbr*« (raupa of persons or things which satisfy given conditions, 
tabulation. suc h as the number living in a town, or the total 
value of imports from France; B. The grouping of a great 
number of units in relation to some particular property pos- 
sessed by all, with the object, not of answering assigned ques- 
tions, but of putting the material in a form ready for use 
in further investigations — e.g., the population according to 
ages, or wage-earners according to the value of their wages ; 
C. The tabulation of non-numerical answers in suitable groups 
to give a view of the whole — e.g., the causes of strikes or the 
state of employment. The division between groups A and B 
is not always definite. 

In the tabulation the convenience of the reader must be 
studied. The table must be so arranged that any totals 
required can instantly be found. This is to a great extent a 
question of typography, the use of suitable founts for figures 
and headings, and also of the choice of the right shape <md 
size of page. Supposing the best possible choice made in 
these respects, our rule will then be to get the maximum 
amount of information into a given space. 

Group A. — Thus we can have single tabulation, answering 
cuMMotubo- one or more groups of independent questions, 

Utioo. a<s . — v 


Number and Membership of Trade Unions.* 




« 

Year. 

Number of Trade 
Unions at end of 
Year. 

Total Membership of these 
Unions at end of Year. 

1896 

1,3*7 

I.493.37S 

v ,8 97 

1,307 

1,611,384 

1898 

I f 267 

>.644.59* 


w Sscuble tabulation shows the subdivision of a total accord- 
ing to two categories, in the example giving on p. 55, 
according to sex and age : — , 


* Compiled from the Sixth Annual Abstract of Labour Statistics, p. y 
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Classification of Paupers in Ireland. — Total Numbers who 
received Relief during the Year ended Lady Day 1892.* 



Males. 

Females. 

TotaL 

Under 16 years 

Of 16 and under 65 years 

Of 65 years and upwards 

44,391 

132,370 

35,121 

43.648 

79.045 

45.668 

88,089 

111,415 

80,789 

All ages • 

211,882 

168,861 

380,343 


Mote information may be included thus : — 


Classification of Paupers in England and Wales. — Total 
Numbers who received Relief during the Year ended Lady Day 

1892.1 


Ages of Persons Relieved. 

Indoor. 

Outdoor. 

Total. 

Metro* 

polls. 

Other Parts 
of England 
and Wales. 

Under 16 years - 
• Of 16 and under 65 years 
Of 65 years and upwards 

111,782 

232,284 

114,144 

441,805 

385.299 

387,760 

653,587 

617,583 

410,904 

XOO,67l 

148,066 

64,779 

452,916 

469,517 

337,125 

All ages - 

458,210 

1,114,804 

1 . 573,074 

813,516 

1,259,558 


A treble tabulation can be used, subdividing the total 
into three distinct categories, with cross totals for each group. 

.Thus the table on p. 56 gives separate divisions according to 
age, sex, and district; percentage lines, in a distinct type, 
are f also introduced : — 

# The same process can be further extended : the example 
in the table opposite shows an arrangement for a quadruple 
tabulation, distribution by district, date, sex, and industry, 
with subsidiary information ; but it is generally better to use 
two # or more tables than to increase the complication, unless 
it is necessary to bring several categories into close relation. 
Suitable varieties of type will often make comparisons easy in 
a very complex table. 

• Compiled from the Sixth Annual Abstract of Labour Statistics , p. 102. 
t Ibid., p. 101. 1 


























Classification of Paupers by Age, Sex, and Locality.* 
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Ibid 10 1. t The returns do not distinguish sex undgr 16 years. 
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Lookvr\g now |t the census householders' schedule* (facing 
p. 22 ), we can see that there are about thirteen different items 
of information about each person : district, posi-* Tabulation oi 
tion in family, condition as to marriage, children, n “* trU1 - 


serf, age, occupation, industry, industrial status, infirmity, 
birthplace, nationality, and house-room. These could be 
tabulated in 78 different double, 286 treble, or 715 quadruple 
tabulations, so that there is plenty of scope for choice. 

*To fix our ideas, we will take occupation as the main sub- 
division, and examine Mr. Booth’s use of the Mr. Booth’* 
census returns, say for London Printers.* ubuUtion ‘ 

First he gives a treble classification — occupation, sex, and 
age — using columns corresponding to 3, 4 and 10 of the 1911 
schedule. 


Census Divisions, 1891 

Females. 

Males. 

Total. 

All Ages. 

* 19 . 

a°-S 4 - 

55 * 

X* Printer - 

1,316 

9,988 

21,784 

1,921 

35,009 

• 

2. Lithographer, &c. - 

809 

757 

3,037 

437 

5,040 

Total • 

2,125 

10,745 

24,821 

2,358 

40,049 


Then follows a single table, district and numbers, using the 
information on the back of the schedule. 


Distribution. 


• E. 

N. 

w. &c 

s. 

Total. 

5,884 

9.83s 

7,577 

16 , 753 

40,049 


Three simple tables are then given, relating to heads of, 
families, using columns 2, 3 and 4 (sex), 2 and 14 (birthplace), 
and 2 and 12 (industrial status). 


Life and Labour of the People , vol. vi., p. 189. 

























58 ELEMENTS OF STATISTICS 

His next table uses columns 2 and 10, aird is as follows 


• 

Total Population Concerned. 


Heads of 
Families. 

Others 

Occupied. 

IJn occupied, 

Servants. 

• 

Total. 

Total ... 

18,048 

16,060 

47,257 

854 

82,219 

Average in Family - 

I 

.89 

2.62 

•05 

4-56 


The next table (not here given) is a single classification 
according to number of rooms and servants, a most ingenious 
indirect use of the scheduled information; and the last is an 
example of the legitimate use of a quadruple tabulation — 
occupation, industrial status, sex, and age — given on the 
next page. 

It would be difficult to find a better example of tabulation 
of a great multitude of details to serve a special purpose. The 
census authorities had in many cases not tabu- 
tabulation^. i a ted the* necessary details, and it was necessary 
to turn through the original schedules to get at the facts. 
For such work as this, the function of tabulation is simply to 
provide the answers to definite questions. Thus the census 
reports show how many persons of each sex and age-group 
belong to certain industries in certain places, in a quadruple 
tabulation extending over many pages, each page relating to 
one district, and this table may be used for accomplishing 
many separate purposes : each item is already a total ready 
for use. It is impracticable from limits of time ancj space, 
even if it were desirable, to tabulate all the possible groups of 
qualities which can be made from all the statements on euch 
census form ; a good tabulation will aim at providing only ' 
those statements which are of practical use. Thus many 
simply descriptive totals are given, such as the numbers of 
each sex and age in each parish in the United Kingdom, to 
^erve primarily for .administrative purposes ; and many state- 
ments which will afford the economist and sociologist the 
opportunity of tracing the progress of industries, of studying 
the ages of workpeople in different occupations, the changes 
in age-grouping of the nation; and some further tables might 




r g * »> 
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Proportion of Employers to Employed— 
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no doubt obe given to throw light on problems ot specialanterest. 
In each successive census new tables are to be found. 

It is interfesting to open one of these great tables of figures, 
such as are generally to be found forming the bulk of a blue- 
book, and taking a figure at random, ask " Why 
is this figure printed, what 'question does it answer, 
to whom can it give information ? ” For instance, in the 
Eighth Report on Trade Unions, p. 257, we find that the United 
Brickworkers’ and Brick Wharf Labourers’ Union spent £ 20 
on funeral expenses in 1894, an average of 3s. j\d. per member. 
As an isolated statement this may interest a very small number 
of persons ; but that small number has a right to expect that 
they shall find the figures relating to their union tabulated 
in a general official book ; to them it may be as important as 
the item, on the same page, of £5,481 spent by the Boiler- 
makers. From this point of view, the question of inclusion of 
such small items is simply one of space. If space is limited, 
a selection would be made of larger quantities only, as being 
likely to concern more people. 

But there is a reason of quite another character for printing 
such items as these. The raw material, on which the totals 
impotunc# oi in such tables are based, is not accessible to the 
n« material, student except by means of this Report. Now, 
the compiler of these statistics cannot know from what par- 
ticular point of view they will be studied. It may be desired 
to examine and group trade unions according to their expendi- 
ture on different items, to study their history, classifying them 
as fighting organisms and as friendly societies. The tabula- 
tions needed cannot well be foretold. The material is there* 
fore given in the rough, in order that the tabulation m?y be 
made by each student according to his needs. At the same 
time the most suggestive totals are given as one of thesb 
possible methods of tabulation ; and in the summary of such 
a report, the items are retabulated, the rough material 
being omitted, in those ways which the editor thinks most 
useful. • 

.When space is much too limited for any publication in 
extenso of the items, a careful selection must be made of those 
setectionof to be printed; and it is this selection that is 
r«* material, generally open to most criticism. 

The Census supplies an illustration from the County Borojigh 
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of CovHtry, igfei,* where the following detail is %iven for 
115 persons : — 


Brick, Cement, Pottery and Glass. Males. 



while all the males — masters, foremen, skilled workmen, 
labourers and boys — engaged in the cycle and motor-car 
trade afe shown in no more detail than : — 


Vehicles. 


— 

Age - 

10- 

*3- 

x 4* 

x 5* 

16. 

x 7- 

18. 

x 9- 

90- 

95- 

35- 

45* 

55‘ 

65- 

Totals. 

Cycle ant^Motor Car — 
Makers, Mechanics 
Motor Car — 

Makers, Mechanics 



1 

19a 

a7 'l 

303 

395 

37 x 

379 

9,003 

3.879 

■a 

00 

00 

i,xaa 

| 

0 

ts 

cn 

76 

,x »775 



— 

70 

1 08 

1 143 

160 

**<? 

208 



*376 

, S 37 

*58 

36 

6,838 

Others ... 

1 “ 


~ 

a 

\ * 

4 

8 

7 

3 X 

6 a 

4, 


ai 

4 

aij 


It is explained on p. iii of the volume that full particulars 
(by age) of relatively important occupations in a district are 
shown in italics. 

In such cases, two useful rules might be applied : omit all 
numbers under, say, 500 when by so doing a line of print 
would be saved; and give all numbers over 10,000 correctly 
only to the nearest 100, and so for other digits in proportion,, 
thereby reducing the width of columns of pri nt . If, for example, 
we knew to the nearest 100 the exact numbers in each district 
dnd occupation in which as many as 1000 were Economy <4 
employed, our knowledge would be as complete #pace - 
as we needed ; and it is doubtful whether the space occupied 
l5y such a tabulation would be more than that already devoted 
to the subject. In many cases, on the other hand, it is essential 
to have the raw material quite unchanged. Eaph tabulation 
must be judged on its own merits. 

If may be useful to take a particular group of answers, and 
discuss what tabulations will throw most light on the questions 
at issue. The Poor Law Commissioners of 1833 TabuUtion * ^ 
collected information from a thousand villages in *><* uw 
England and Wales on the following six points Return *’ l833 * 


Census Report, Cd. 7019, p. 597. 
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among others : the wages of an agricultural labourer ii? Summer 
and in winter, both with and without the inclusion of beer as 
part payment, his annual earnings, and the subsidiary earnings 
of his wife and children. It may be supposed that the chipf 
object of the Commissioners was to find whether the labourers’ 
families earned enough for their support, and what proportion 
was earned by the wives and children. 

The following scheme of tabulation would show in what 
counties the labourer was badly off : — 


County. 

Average Annual Earnings of ^ 

Mao. 

Family. 

Together. 




t 


The counties might be taken in alphabetical order for con- 
venience of reference, or in geographical order with subordinate 
averages for groups (e.g., Eastern : Norfolk, Suffolk, Essex) ; 
or the counties might be arranged in the order of the total 
earnings, so that it could be seen at a glance in which counties 
the labourers were worst off. 

To show the number of villages, county by county, in which 
the earnings were below a certain minimum, or within»certain 
limits, the table given on p. 63 might be used. 

This table can be used in the above complex form or simpli- 
fied. The number of subdivisions of money to be distinguished 
depends on the space at disposal and on the number of villages 
which would be entered in each. A table in which most of 
the entries are 1 or 0 is open to criticism. In the above table 
the villages are too few to allow accuracy in percentage. 

It will be seen that this table would furnish the answer to 
almost all questions which could be put as to total ea rning *; 

( Tabulation to For instance, if we wish to see the relation between 
•how coelution. ^0^41 gamings and the family’s subsidiary con- 
tribution, we should look at the smallest totals in the last 
column but one and see if they corresponded with the largest 
percentage of family earnings. If we found signs, of CQsre- 
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spondenee; we shcluld rearrange the counties in the drder of 
these subsidiary percentages, and see if they were approxi- 
mately in order of total earnings also. This is an* example of 
tabulation to show correlation, the correspondence in the 
occurrence of two^ets of # phenomena. 

Another important group of questions arising in connection 
with these tables is : What is the relation between weekly 
wages and annual earnings, and what proportion and 

of the wage is generally paid in kind ? We shall • arnin 


Annual Earnings or Men and Families. 


Number of Villages in which the Total Earnings averaged 

Average Earnings in 
County of 

b 

• 

£*s- 

?A 

"si 

J; 

*2 0 

< a 

h 

s *s 

3s 

14 

Is 

< a 

52 
< § 

14 

♦ V 

12 
< a 

& 

ss 

1 

< 

i 

a 

1 

-4 

3 

H 

.ft 

s 0 
s ® 

fg 

K 

h 

Ik Norfolk 

0 

1 

3 

6 

4 

3 

a 

£30 

£* 1 

£i* 

*7 

Percentages of 
T* 4 *l Number 
9/ Villages 

0 

5 

16 

3*\ 

SI 

16 

ro| 


... 


Ik Suffolk 

0 

3 

4 

5 

3 

a 

a 


£u 

£39 

a8 

Percentages of 
Total Number 
if Villages 

0 

16 

SI 

16 

16 

JO* 

roj 





In Essex • 

1 

3 

6 

7 

10 

3 

1 


£10 

£3& 

s6 

Percentages of 
Total Number 
if Villages • 

3 

ro 

*9 

* 3 

3 • 

10 

3 


... 



In Eastern 
• Counties 


7 

n 

18 

*7 

8 

5 

£mB 10 

£10 10 

£39 

*7 

Percentages of 
Total dumber 
if Villages 

X 

JO 

19 

s 6 

*J 

IS 

7 


... 

... 



not now require the statements as to subsidiary family earnings. 
In records of agricultural wages the most common statement 
was, e.g., " wages in this district are from ios. to 12s. a week/' 
•Now, a farm labourer did not generally earn as much in winter 
as in summer, because wages were reduced to correspond tq 
the smaller amount of work necessitated by failing light ; from 
this cause annual earnings will be less than the weekly wage 
multiplied by 52. Besides this wage he generally receives 
spe^al mopey at hay and wheat harvests, and also many 
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paymeifts in kind, such as daily beer, house and gfound at 
reduced rent, and other privileges. It is generally best to 
value all these, and compute his earnings thus : — 


ios. for 38 weeks - - £19 0 

12s. for 9 weeks (summer) y 8 
Hay harvest, 1 week - o 15 

Wheat harvest, 4 weeks 5 0 

Beer, is. per week - 2 12 

Cottage and ground - 50 

Other perquisites - 15 


o 

o* 

o 

0 

o 

o 

o 


£39 o 0 = 155. per week. 


In this case earnings are 50 per cent, above the general 
weekly wage. An estimate of this nature has been made by 
the late Mr. Little for each county for 1867-70 and 1892. 
The question, Are winter wages generally below summer 
winter and wages, and by how much? can be answered by 
•ummer w«ge«. following scheme of tabulation, which uses 
the data not employed in the previous tables : — 


Counties. 

Average Weekly 
Wage in 

Number of Villages where the Excess of 
Summer Wages over Winter was 

Summer. 

Winter. 

Nothing. 

6d. 

IS. 

is. 6d. 

as. 

More 
than as. 


r. <L 

x. d. 







Norfolk - 

II 2 

IO 3 

13 

2 

3 

2 



Percentage of Number of Villages 
included 

46 

7 

11 

7 



Suffolk ... 

10 2 

9 8 

m 

1 

6 

1 

2t 

1 

Percentage of Number of Villages 
included 

1 

H 

18 

3 

6 

& 

3 

Essex • s* • 

IO 9 

9 10 

22 

0 

11 

0 

5 

4 

Percentage of Number of Villages 
included - - • - 

5 * 

0 

26 

0 

/*• 

10 

1 

Eastern Counties 

10 6 

9 u 

B 

2 

20 

3 

12 

8 

Percentap of Number of Villages 
included 

B 

2 

19 

3 

12 

8 
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These, examples do not quite exhaust the useful tabulations 
of these groups of figures, for we have not yet examined the 
distribution of wages, that is the relative numbers paid at 
different rates. These returns do not, however, illustrate 
sifch a tabulation well, for we are not told the rates paid to 
individuals, but only the rate prevalent in the villages. 

Group B. — The grouping according to wages affords an 
example of the second method of tabulation. We have now 
no definite questions to answer, as in the method so far dis- 
cussed, but a more general problem : given a mass of data, it 
is required to tabulate it, so as to present the maximum 
amount of useful information. Our raw material is so many 
thousand isolated statements, which must be focussed, made 
to present definite meaning, and worked up so as to be useful 
for future comparison. 

Some investigations are undertaken not to answer any 
definite questions or to throw light on any given problem, 
but to collect information which, though it has statl#tic#whOM 
no immediate use, is likely to be needed ulti- purpose i* not 
mately by many investigators occupied with ****** 
vai^ous questions. Such is a wage census. So long as we 
have no sufficient account of wages, we are badly informed as 
to one of the most important measurements of the social body, 
and economists and statisticians are continually hindered by 
the want of data essential for their work ; but the census has 
no immediate practical use, for knowing the height of wages 
does not # help us directly to regulate that height. In such an 
investigation our object will be to examine the figures, and 
give all the groupings and averages which seem likely to be 
useful # for any purpose ; and while doing this we shall imper- 
ceptibly pass to a different class of investigation; we shall 
b£ finding a structure underlying our multifarous details ; we 
shall find that the chaos, which our figures present at first 
sight, obeys laws; we shall be making a visible qutline, and 
giving a definite shape to our apparently featureless mass. 

• The complete discussion of this problem belongs to a later 
chapter; but the tabulation can be begun without spedaj 
technique. The examples taken will relate chiefly to wages, 
but the methods are quite general. 

In the American Refort on Wholesale Prices , Wages and 
Transportation of 1891, the wages of some 10,000 persons are 

m * * F* 
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detailed? It is proposed to consider their* tabulation as a 
Miction ot unite ^homogeneous group. The results are given on 

•f croup*, 'pp 69, 70. j n the original publication the wages 
are given to half a cent ; in the second column, on p. 69, the 
numbers of wage-earners are given t in io-^ent groups, from 
$.25 to $.34, $.35 to $.44, and so on, those earning wages 
exactly at the dividing points being always placed in the 
division below. Notice that the average wage of such a group 
as $2.15 to $2.24 is not $2.20 if the wage-earners are evenly 
distributed cent by cent, but the average of $2.15, $2.16, . . . 
$2.24, i.e., $2,195. 

Looking at column 2, we shall see that the figures present 
no order, follow no rule ; no structure has yet been found, our 
divisions are too narrow for our material. 

Now group the wage-earners with wider limits, as in 
column 6, where the numbers earning in half-dollar groups 
are given ; we have here a nearly regular sequence of numbers 
falling after the maximum in the second group. Going back 
to narrower limits, to find exactly at what divisions this regu- 
larity is first in evidence, we have in column 4 the numbers in 
20-cent groups which show considerable, but not absolute 
regularity. The numbers in 30-cent groups * are successively 
75. 355. 674, 1242, 740, 66o,- 343, 310, 180, 181, 233, 32, 82, 
3, 4, 8, 1, almost completely regular except for the large group 
at $3.25 to $3.55. 

The question as to which of these groupings should be 
selected is to be decided by the number of separate items the 
eye can instantaneously grasp. In looking at the 51 numbers 
in the 10-cent groups, or the 26 in the 20-cent, the meaning is 
lost in a maze of figures (though as many details ar these 
could be properly shown in a diagram), but the 11 numbers 
in the half-dollar groups are easily comprehended. 

Stated in words, the result of our tabulation (column 7) is 
that 6 per .cent, of the wage-earners made from $.25 to $.74, 
29 per cent, from $.75 to $1.24, and so on. 

For the practical work of the tabulation from the Original 
.figures, we should take ruled sheets, enter at the head of 
practical tabu- successive columns certain wage limits, and 
turning through the items enter each wage by a 


• V id*, p. 97, infra. 
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dash in* its appropriate column, grouping them in lives and 
tens, to facilitate addition. , 

From the preceding paragraphs it is clear that we do not 
n^ed to take separate columns for each cent from $.25 to 
$5.35 for tabulation, but a little consideration is necessary to 
see how minute the limits should be to give the correct average. 
Suppose the entries in cent groups to be : — 


$1.70 

$1.71 

$1.72 

lr -73 

*i -74 

mu 

•1 

mu 

mu 

mu 

11 

mu 

mu 

hi 

mu 

mu 

mu 


The average of the wages so entered can be quickly cal- 
culated as $1,718. 

If, on the other hand, we put all the 51 entries as simply 
" between $1.70 and $1.74,” or more exactly “ as much as 
$1.70 but less than $1.75,” we should naturally take them to 
be all (for purposes of averaging) at the middle point of this 
group, viz., $1.72. 

If we have a sufficient number of items, the differences 
between the average assumed and that calculated for each 
group will be very slight. This is seen on p. 69; column 8 
gives the averages calculated from the entries in 10-cent 
groups, while column 9 gives them on the hypothesis that for 
purposes of averaging the numbers in the half-dollar groups 
may all be taken at the middle points of their groups. The 
difference is greatest in the first and last, the smallest groups. 
The general average obtained from column 9 is $1.70, which is 
trie nearest round number to the true average $1.73. Hence, 
for the purpose of obtaining the general grouping and average, we 
need only take n half-dollar columns for marking in our items. 

For other purposes it may be advisable to work more 
•minutbly ; for in the lowest group, we shall wish to know how 
many are earning $.25, $.30, $.35 separately, for 5 cents is a 
perceptible difference on 25 cents. At the top also it may be 
useful to know the exact wages. 

More min ute entries ’again will be needed for the second 
method of. tabulation, which is as follows Suppose all the 

F 2* 




68 ELEMENTS OF STATISTICS . 

( 

wage-earhers to be arranged in order of the nfa&nitude 
Thtcitmic pf their wages, those at $.25 at one end, those 
m ** h#d * at $5-75 at the other. Note the wages of men at 
given points in the row. The lowest wage is $.25 ; one-tenth 
of the way along, that of the 512th worker is f between $.85 and 
$.95, . . . half-way up the wage is $1.50. The figures at each 
tenth are given on p. 70. By this means we get a very vivid 
idea of the distribution according to wages. * 

These numbers cannot be obtained accurately if we have 
only entered the details correct to half-dollars, but can be 
found from the 10-cent grouping, which is therefore the classi- 
fication to be adopted. We must first determine in tvhich of 
the small groups the men one-tenth, two-tenths ... up the 
group lie, and then estimate their position inside the smaller 
group. Thus, if we want the figure more accurately than 
“ between $.85 and $.95,” as given above, we proceed as 
follows : — The 512th man from the bottom is the 82nd man in 
the group between $.85 and $.95, for there are 430 earning less 
than $.85 ; this group contains 169 ; if they were distributed 
regularly, 17 to each cent, the 82nd man would be half-way 
through this group, between $.89 and $.90. The hypothesis 
of even distribution is sufficiently correct for most purposes, 
and this method affords a sufficiently accurate means of 
determining the wage of the workers at the tenth places. The 
resulting figures are given on p. 70. If, however, we want to 
know the wage of the half-way man more exactly, we see from 
the half-dollar groups that it is between $1.25 and $1.75, a 
rough approximation shows it to lie probably between $1.45 
and $1.55, and then we rapidly turn through our original data, 
isolating the wages at $1.46, $1.47, . . . $1.55.* * 

A slight modification of this method is also useful. Take 
the average of the lowest 512 (or tenth), namely, $.70 $ ; of t&e 
next, namely, $1.03 ; and so on (see p. 70). These figures also 
give a vivid view, and are very convenient for comparisons 
with other groups. 

The figures so far apply to only half of the data In the 1 
Senate Report. On p. 70 the whole are tabulated to give 
the average wages of the successive tenths. A comparison of 
the two groups so obtained shows how far the first half was 
typical of the whole. 

* • On this method see pages 102-7. r * 
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•Tabulation or Wages— American Figures, 1894 


4 - 


Earning Daily 
Wages. 

No. of 
Persons. 

V* 

as much and leas 



•95 

1.05 

1.05 1. 15 

1. 15 1.25 

*•25 i-35 

*•35 i-45 

*•45 1-55 

*•55 

1.65 

1.65 1.75 

*•75 

1.85 

1.8s 

*•95 

*•95 

2.05 

2.05 

2. 15 

2.15 2.25 

2.25 

2-35 

2.35 

2.45 

2-45 

2.55 

255 

•2.65 

2.65 

2.75 

2-75 

2.85 

2.85 

2.95 

2.95 

3-05 

305 

3- *5 

3- 15 

3-25 

3-25 

3-35 

3-35 

3-45 

3-45 

3-55 

3-55 

3-65 

365 

3*75 

3*7S* 

3-85 

385 

3-95 

3-95 

4*<* 

405 

4-15 

4-15. 

4.25 

4-25 

4-35 

4-35 

4-45 

4-45 

4-55 

4-55 

4-65 

4.65 

4-75* 

4-75 

4-85' 

4-85 

4-95 

4-95 

505 

505 

5* *5 

5- *5 

5-25 

5-25 

5- 35 


▼erag^Wagel $V73I 




$ 

u much and less 

•as thin 

•*S -45 

•45 -65 

■6j .85 *70 

•85 «-°5 370 

1.05 1.25 989 

I - 2 5 «-45 557 

1.45 1.65 538 

1.65 1. 85 S3* 

1.85 2.05 331 

2.05 2.25 310 

2.25 2.45 

2.45 2- 65 

2.65 2.85 

*.85 305 

3-OS 3-*5 

3-25 3-45 

3-45 3-65 

3- 65 385 

3.85 4.05 

4.05 4.25 

4- 25 4-45 

4- 45 4-65 

4.65 4.85 

4.85 5.05 

5- 05 525 

5- *5 5-35 


4-75 5 *5 
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Wages of “ Tenth n Men {(UciUt\ 


Lowest Wage 
A*h up Group 
Ath tt 


S ir 

I a* 

Ath 
A‘h 

||SS 

M*th 


a 

o , 

IS 


^ »» 
Highest Wage 


$.25 

.89 


i -39 

M 9 

I -75 

1.99 

2.36 

2.98 

5-35 


I 


Average Wage of 

Same for 
10,000 
Workers. 

Lowest tenth • $. 70 

Second ,, - 1.03 

Third „ • - 1.18 

Fourth „ - 1.28 

Fifth „ - 1.44 

Sixth „ - 1.59 

Seventh ,, -1.86 

Eighth „ - 2.14 

Ninth „ - 2.59 

Highest ,, - 3.51 

•79 

1. 00 a 
1.24 

I.50 

U& 

2.00 
2.22* 
2.58 
*55 

General Average 1.731 

1.82 


The tabulation of the data collected for the Wage Census 
of 1886 on such forms as that on p. 71, illustrates well some of 
the difficulties involved. The items given on the main part 
of the schedule are of this kind : — 

No. Average Wage. 

Spinners — Time : 6 : 12s. : 56 J hours. 

Such returns are not perfectly definite, for if many are 
employed in the same occupation in a mill, itr is possible tjtiat 
Tabulation in they will earn at different rates. Thus this entry 
the wa*« wn.ua. G f 6 at I2s. might arise from either 6 men each 
earning 12s., or 2 at 10s., 2 at 12s., 2 at 14s. (average 12s.); 
or 4 at 12s., 1 at 15s., 1 at 11s. ; or 5 at 12s. and 1 at 18s. — 12s. 
being the general rate, but not the average, in these last two 
alternatives. Since the purpose of the wage censu^ was to 
give a comprehensive account of wages adapted for use in all 
investigations, it should show the numbers in all trades and 
subdivisions of employment by age, sex, and district, the 
average and general rate of pay for each group, and sufficient 
details to show the distribution about the average in eath 
group, for a mere average may conceal exceptionally high or 
exceptionally low wages. 

On inquiry at the Labour Department as to whether the 
original information had been given in a more detailed form 
than the line above, or whether divergencies might be con- 
cealed, the author leamt that the subdivision of occupations 
had been carried to such an extent, that in practice, where 
there was any great variation in th 4 wages of workers under 
one heading, that heading had been split up, so that jeach 





RATES OF WAGES.* 


71 



From Wagm in the Minor Textile Trades (C.— 6161 of 1890). 
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group w&s separately entered, or that several groVtps were 
distinguished under one heading; and that when there was 
reason to believe from the light of other returns that this had 
not been done, supplementary inquiries were made on thjs 
point, so that the original data were Retailed enough for any 
requisite fineness of tabulation. 

The problem then was to tabulate the answers from the 
various factories in a district, to show clearly and succinctly 
the distribution of wages in each subdivision and in the 
whole. It can hardly be said with confidence that the method 
adopted, of which a specimen is given on p. 71, is entirely 
satisfactory. * 

To clear our ideas let us suppose that the details on which 
the line relating to throwsters (time) was based were as 
follows : — 


3 earning 14/ 


14 

>> 

15/ 

6 

ft 

15/6 

20 

tt 

16/ 1 

10 

tr 

17/6 

20 

tt 

18/ - 

8 

» * 

18/6 

10 

tt 

19/ . 

10 

tt 

20/6 \ 

8 

ft 

21/5 f 


- “ average minimum rate." 


68 within 10 per cent, of the average 
for all, which is 17/7. 


18 earning 20/11 on the average. 


The process adopted in the tabulation may be supposed to 
have been to separate from the whole group of returns a small 
v&nous method, group of old men or inferior workers earning fir 
poaiibi*. below the average, and enter them as a distinct 
minimum group, and to separate a small group of the most 
skilled workers and enter them as a maximum group. Tins 
is better than giving simply the highest and lowest of the 
individual wages, for either of these may be due to excep- 
tional circumstances, and may be quite a long way from that 
paid to any other person. The exact size of these extreme- 
groups must be determined from inspection of the returns 
themselves. After this has been done, the remaining wages 
may not be grouped close together; in the example taken 
they are scattered between 15s. and '19 s. To give some clue 
as to this distribution the number earning within id per <yut. 
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of the rfvterage is stated ; this is probably the best wafy if only 
one column can be devoted to it, but io per cent, is a wide 
limit to adopt. Another method would be to give the limits 
vjthin which the wages of the io per cent, of the earners above 
and io per cent.* below .the average were contained: in this 
case 16s. and 18s. 

If, however, not more than 8 columns are to be devoted to 
each group, the following arrangement would give much more 
definite information, and it could have been made from the 
data in hand, and would be well adapted for all the purposes 
for which it would be required. 

Number employed log 

Average weekly rate 17/7 

One-tenth of the number of wage-earners 
received not more than - 15/ 

One-quarter of the number of wage-earners 
received not more than - - - - 16/ 

One-half of the number of wage-earners 
received not more than - - - - 18/ 

# One-quarter received not less than - - - 19/ 

One-tenth „ „ „ „ - - - 20/6 

This method was used in the publications of the wage- 
census of 1906, except that the tenths were not given. 

After studying Chapters V and VI, readers will naturally 
replace.the phrases used above by the terms median, quartiles 
and deciles, and consider whether one of the measures of 
dispersion would not be more appropriate to use than the 
detail^ here suggested. 

We are fortunately not dependent solely on the tabulation 
sfe given above, for wages in industries as a whole Th.««*r,i 
are also tabulated for 1886 * on the following plan, * umnUr 7. 
which is in a form most useful for purposes of comparison 

(P- 74 )- 

The lines giving percentages are very helpful. We can fit a 
glance compare the levels of wages in different industries. Thus 
in the cotton manufacture the average wage is 2$. higher than 
in the woollen; and in the cotton there is a large group of 
highly skilled workers earning from 30s. to 35s., while in the 


• * More detail is shown in the Reports for 1^06. 



Number and Percentage of Persons Employed at Various Rates of Wages.* 

« 

Table showing the average Normal Wages paid to men in the undermentioned employments, and the 
Number and Proportion of men paid at different rates, at October 1886. 
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Gerurcil Report on Wages (C. — 6889 of 1893). 
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woollen* pearly half are close to the. average, earning* between 
20 s. and 25s. In the jute and linen manufactures the averages 
are nearly the same, but in the former a larger proportion are 
below thfe 15s. limit. In the silk manufacture there is an 
aristocracy as in # the cotton, but it is smaller and better paid, 
for 12 per cent, earn more than 35s. This table is a master- 
piece of concentration and clearness. 

, We will discuss next the tabulation of the figures relating 
to CHANGES in RATES of wages collected by the Labour De- 
partments. The following examples are taken TabuUtioo u 
from the earliest report; the form of the tables eh»n».o» 
has been modified many times since then, and a wmtt,nbum - 
study of these alterations can be usefully followed by turning 
through a file of the annual reports. The details collected 
on the earlier blank forms show the occupations and numbers 
affected, the dates from which the changes took place, and 
the wages and hours in a full week exclusive of overtime (a 
definition corresponding exactly to that used for the wage 
census) before and after the change. 


Extract from Table showing the Changes in Rates of Wages and 
Hours of Labour of Ordinary Agricultural Labourers in Various 
Districts of the United Kingdom in 1894, so far as reported to 
the Board of Trade.* 


• 

County and Union. 

• 

• 

Particular* of Changes in 
Summer Wages. (1894 com- 
pared with 1893.) 

Particulars of Changes in 
Winter Wages. (1894 com- 
pared with 1893.) 

NoTofMsIe 

Agricultural 

Labourers, 
Farm Servants, 
Shepherds, 
Horsekeepers, 
Horsemen, 
Teamsters, 
Carters, in ’91. 

Increase. 

Decrease. 

Increase. 

Decrease. 

^Lincolnshire— 
Gainsborough - 
Louth 

Spilsby • 

Norfolk— 
Ayjsham - 
Docking - 
Flegg, East and 
West - 
Forehoe • 

... 

Per Week. 

1 /( 12 / to II D 
6d.(i2/6toi2/) 

1/(12/ to II/) 

Per Week. 

I/(I0/. 1 1 d 

Per Week. 

1/6(15/1013/6) 
i/6(i3/6toi2/) 
1/6 (13/6 to 1 2/) 

MM 

l/(H/tOIO/) 

1/ (n/to 10/) 

2,466 

3 3 ; 9 JI 

a»S T6 
*.487 

1,108 * 
1,448 


* From the second Annual Report on Changes of Wages , pp. 198-9; a little 
compressed. ^ 
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Extracts from Table showing the Changes in Rates JdS Wages 
of Ordinary Agricultural Labourers in Various Districts of the 
United Kingdom in the Summer of 1895, so far as reported to 
the Board of Trade.* 


County and Union. 


No. of Male 
Agricultural ' 
Labourers, Farm 
Servants, 
Shepherds, 
Hortekeepers, 

Horsemen, 
Teamsters, 
Carters, In 1891. 


Particulars of 
Changes in Sum- 
mer Wages (1895 
compared with 
*894X 

Decreases in 
italics . 


Durham— 

Stockton* • 
Teesdale 
(Barnard Castle 
Rural Dist. ). # 

Oxfordshire— 
Headington • 
Henley 

(Hambleden Rural 
Diet., Bucks). 


Norfolk— 

Flegg, East & West 
Forehoe 
Henstead 
Mitford and Laun- 
ditch 

Small burgh - 
Swaffham • 

Way land 


Carnarvonshire— I 
Carnarvon - 
(Gwyrfai Rural 
Dist). 


Labourers with 
out food, ad 
vance of is. 
Labourers &ith 


( 

Weekly Rate of Wage* 
in Summer. 

1894. 

1893. 

a. d . 

a. d . 

17 6 

17 Or 

17 6 

18 O 

12 0 

II O 

12 0 to 

II 0 to 

14 O 

*3 0 

II 0 

10 0 

11 0 

10 0 

11 0 

10 0 

II O 

10 0 

II 0 

10 0 

II 0 

10 0 

II O 

10 0 

19 O 

20 O 


c 

II O 

12 O 


• Agricultural labourers in this district are hired in March and April for a 
year certain, and the change noted applies to the whole year, and not to tpe 
summer only. 

t The number of agricultural labourers, etc., is for the Poor Law Union, 
but the change applies to the Rural District only. 

J This number is partly estimated. 

The adjoining tables give examples of the way in which 
the changes in agricultural wages were tabulated in the Second 
c and Third Report on Changes in Rates of Wages 

n<H : ciunf. and Hours of Labour. In the first table space is 

in tabuutton. was { e d by devoting separate columns to increases 

* From the third Annual Report on Changes of Wages , pp. 118, 119, 121 

(typography adapted). • •• 
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and deceases, with the intention of making the table*distinct ; 
while it is not clear whether " Winter 1894 ” means the winter 
beginning in or that ending in that year. 

% In the second table, which refers to summer wages only, 
the columns ar» rearranged ; and increases and decreases 
printed in the same column, the latter in italics. In the Fifth 
Report all the information is printed in a clearer way, thus : — 

Winter Wages.* 


Q 

Number. 

Weekly Rates. 

Increase or Decrease per 
Week in 1897. 

Tendring 

3,H3 

Jan. 96. 

8 . d. 

IO 0 

Jan. 97. 

1. d. 

XI 0 

Increase. 

8 . d. 

I O 

Decrease. 


The tabulation is repeated for the summer. 

The weakness in these agricultural returns is in the numbers 
column. In the returns from other industries the numbers 
given are those actually affected, but in this case n,. number 
it \s not found possible to obtain this number ‘ fl * cto<L 
correctly, and the number entered is that found under " agri- 
cultural labourers ” in the 1891 census, which includes the 
various categories as given in the above table. When a change 
of wages takes place in a rural district, we may perhaps assume 
that it is likely to be general, though, if it was a reduction, it 
might not be made by the better employers ; and though the 
change ’will not take place in the same week throughout the 
district, there is not likely to be much variation in this respect. 
The change was generally made at the time that winter wages 
gave place to summer, or summer to winter; and a slight 
increase or decrease may take place by making the winter 
reduction or the summer advance later than usual. On the 
whole, little error will be introduced by assuming that the 
change stated affects all the adult agricultural labourers in 
.the district, and it is quite probable that a proportioned change f 
will take place in the wages of horsekeepers, shepherds, and 
others, though it may not in the case of boys,, or old m6h 
who are earning less than the district rate. The question, 

• From the fifth Annual Report on Changes of Wages , p. 145. 

f On these points see Mr. Wilson Fox's Report on Wages and Earnings 
of&ricultural Labourers, 1900, P* 5°» and pp. m-157. ^ 
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" Approi&mate number of able-bodied labourers in jfdrish ? ” 
is asked on tjie inquiry form, but as the answers are not used, 
it may be assumed that they are generally not given with 
sufficient exactness. 

The object of the whole tabulation is to show the change in 
the national weekly wages bill, but many details are lacking for 
the complete calculation. In the case of agricultural labourers, 
we need, in addition to these data, accurate statements of the 
uckotdat*. c ^ an 8 e additional earnings, special payments, 
and payment in kinds. In all cases we need a 
more complete account of the whole wage-bill as well as the 
change. For agricultural labourers the material has been 
published by the Labour Department ; * every year it received 
returns from most of the 600 unions as to wages at all seasons, 
whether there has been a change or not. 

The looseness in the returns as to numbers does not prevent 
our calculating the change in the county or country rates, for 
changes in the numbers in each district affected by the 
county rates, change may be expected to bear the same pro- 
portion to the numbers given in the census returns, as the 
number of agricultural labourers of the same class in the 
whole county or country does to the census number, and we 
are helped by the principles of weighted averages discussed 
in the next chapter. 

The calculation for Durham in the above table for the 
changes in summer wages 1894-95 may be performed as 
follows : — 4 



Average before 
change. 

Change. 

Proportional 
number affected. 

QH 


t, d. 




Stockton - 

17 6 

-6d. 

4 

-20 

Teesdale - - 

17 6 

+ 6d. 

7 

+ 3 6 


Total change in county, + is. 6d. 

Proportional number in county, 73. 

Effect on county average, ^ - $d. 


Here, for simplicity of calculation, the numbers affected are 
taken to the nearest 100, a process which is not likely to affect 

• On these points see Mr. Wilson Fox's Report on Wages and Earnings 
0/ Agricultural Labourers, 1900 , p. 50, and pp. 111-157. c € 
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the avefsjge perceptibly.* This rough- method is likely to give 
the result as accurately as the original data make possible. A 
similar process with suitable modifications can be applied to 
t^e changes tabulated for other industries. The summary of 
such returns for agriculture for all counties is as follows : — 


Comparison of the Net Effect of the Changes of Cash Wages 
• per Week paid in the Years 1896 and 1895 in certain Districts 
in England and Wales, f 


a 

Wages in 1896 as compared 
with 1895. 

Wages in 1895 as compared 
with 1894. 

District. 

Total** 

Number. 

Net Effect of Changes 
on Weekly Wages. 
Increase (-|-) and 
Decrease ( — ). 

Total ** 
Number. 

Net Effect of Changes 
on Weekly Wages. 
Increase (+) and 
Decrease (-). 



Total. 

Per Head. 


Total. 

Per Head. 

England— 


£ 

Jf. d. 


£ 

d. 

Northern Counties - 

5,662 

“43 

-O If 

3.766 

+ 44 

+ 2 i 

Yorkshire, Lanca- 
shire, and Cheshire 

2,897 

+ 100 

+ 0 8J 

3,942 

- 126 

-7i 

Eastern and Midland 
Counties 

69,869 

+ 666 

+ 0 2 \ 

89,576 

-2,045 

- 5 i 

Southern and Wes- 
tern Counties 

20,901 

-340 

-O 4 

20,441 

*575 

-6| 

Wales * - 

... 

... 

... 

a. 165 

+ 73 

+ 81 

• 

Total - 

• 

99,329 

+383 

+ 0 I 

119,890 

- 2,629 

-Si 


** The number given is the total of male agricultural labourers, farm servants, shepherds, horse* 
k&pers, in 1891, in the Poor Law Unions in which the changes took place. 


• The corresponding calculations for Oxfordshire are : — 

12/ -1/ n ->11/ 

13/ -1/ 16 —16/ 

-*7/ 

Effect on county average, — — 2 d. 

For Norfolk : — 

12/ -1/ 134 -* 34 / 

IJffect on county average, =* — 4 *. 

4 2 5 

f From the fourth Annual Report on Changes of Wages , p. xliv. 
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The Value of this table is not obvious. It seetn$ ff of little 
importance to know how many persons were affected alto- 
critictan of ‘gether ; though it is of some value to learn from 
nirnmuT t*u«. a previous table that 58,578 persons received 
increases, and 40,751 decreases in 1896. Ttys total of persons 
affected is constantly given in these tables ; if a person receives 
an increase of is. one month, and loses it the next, he is counted 
as 2, and his contribution to the next column (net effect, of 
change) is zero. This — £43 may mean that 2000 persons 
received a decrease of is. each, and the remaining 3662 (same 
or different persons) an increase of 3 \d. each, or any other 
figures which would give the same total. The change per 
head in the next column is unimportant ; it only shows an 
arithmetical quotient with no concrete meaning that can be 
expressed in words. If it was replaced by another quotient, 

viz., where n is the number of agricultural labourers in 

the Northern Counties, we should know the effect on average 
wages. In fact, the table would be more useful thus : — 


Approximate Effect of Changes on National Weekly 
Wage Bill. 


Distkict. 

1 NCR BA SOS, 

Decreases. 

Net 

Change. 

Total No. 
Employed. 

Average 

Change. 

No. 

affected. 

Total. 

No. 

affected. 

Total 







i 



The figures given supply an example of the common prac- 
tice of carrying out into detail a calculation which depends 
originally on incorrect numbers, in this case the number 
employed, .and is therefore misleading throughout. Till the 
average (useless here in any case) is taken, the error in this 
quantity has no injurious effect. As shown above, the average 
here given could be replaced by another which would be of 
use, and which would be correct within limits that could be 
defined, and would be narrow enough for most purposes. 

Further, since the column of numbers affected is admittedly 
wrong, the figures should be given to the nearest jcooo rather 
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than to - * units, even if no attempt was made to estimate the 
new figure; " between 5000 and 6000 are affected,” is a more 
useful and correct statement than '* 5662 persons belonged in 
i§9i to a class in some undefined way connected with that 
in question in 1836.” . 

Since the introduction of the minimum wage in agriculture 
the whole problem has been modified and simplified. The 
foregoing analysis, however, still illustrates the adaptation of 
tabular methods to difficult and imperfect data, and shows 
how records of wage-changes in general were handled officially 
for at any rate twenty years. 

The discussion of Group C, the tabulation of non-numerical 
answers, must be postponed till we have analysed the nature 
and use of averages. 


o 



CHAPTER V. 
AVERAGES. 


It is natural, in a book with the present title, to allot a 
considerable space to averages. By the use of averages 
complex groups and large numbers are presented In a few 
significant words or figures; and thus the two definitions of 
statistics, the Science of Averages and the Science of Large 
Numbers , are reconciled. 

Some writers have attempted to draw a distinction between 
averages and means , but no general agreement has been reached 
Average, and as to the exact senses in which the words are 
mean# - to be separately applied.* The best distinction 

may be made by deciding that an average is a purely arith- 
metical conception, such as the average length of life In a 
varied population, which does not correspond to any particular 
group, but is only a short way of expressing an arithmetical 
result; while the word “ mean " is to be applied to some 
objective quantity, such as the mean height of Englishmen, 
about which all height-measurements are grouped in a definite 
way. If this terminology is adopted, most of the discussion 
under A, B, C in the sequel applies to " averages " and under 
D, E and F to " means." 

A. Arithmetic Averages. — We may rapidly pass by 
some of the common uses of the word " average," and pick 
out those which will prove of use in statistics. An average is 
sometimes used merely to avoid big numbers. The average 
weight of the University crew is given, only because it is more 
usual to speak of a man's weight being 12^ stone than of eight 
men's weight being 12\ cwt., and it is easier to connect the 
♦ former with men's weight in general. Similarly, if we are 
comparing the value of the exportations of some commodity 

• Compare the article “ Moyenne/' by. Dr. Bcrtillon, in Dictionnaire 
encyclot>S clique des Sciences Medicates, with this chapter. See also the 
paper by Dr. Venn in the Statistical Journal , 1891, and chap, xviii. in his 
Logic of Chance* • 


82 


AVERAGES 


# 83 

in two periods of ten years each, we should say that tile yearly 
average in the period 1870-79 was £10,000,000, and 1880-89 
was £11,000,000, rather than that the totals were £100,000,000 
and £110,000,000. This leads to the second ordinary use of 
the word. If weiwere comparing the ten years The common 
1870-79 with the eleven years 1880-90, and the dcnomin * tor - 
totals in the periods were £100,000,000 and £132,000,000 
respectively, we should obtain no grasp of the difference till 
we had reduced them to a common denominator by dividing 
by the number of years, and found that the averages in the 
two periods were £10,000,000 and £12,000,000. This class of 
averagesns well known in cricket ; sometimes the total number 
of runs made or wickets taken by each cricketer are stated 
also, but these are rather as so-called statistical curiosities 
than as having much bearing on the skill or luck of the players. 
The numbers by which the seasons' performances are judged 
are the quotients of the number of runs by the number of 
innings, of the number of wickets by the number of runs, and 
so on, all quantities being reduced to a common denominator. 
The average in this sense is very common in mechanics. The 
average pressure per square inch, the average work done by 
an engine per minute, the average speed of a train, are quan- 
tities which it is frequently necessary to use. Such an 
expression as the average rate of interest is precisely similar. 

It will be clear that percentage is a special case of this 
use of average. It is useless when comparing the growths 
of population or of trade to give only the whole Arer.ce* a. 
numbers. An increase of 50,000 in the population rat *“* 
of London is not so significant as one of 10,000 in that of 
Harrow; they must be expressed as increases of 1 per cent, 
and 60 per cent., say, before their meaning can be appreciated, 
and this is the same thing as giving the average increase to 
100 inhabitants. For this reason the records of births, deaths, 
and marriages are always given as rates — so many per 1000 
inhabitants ; and in these cases a double average is given, for 
•the nftes signify so many per 1000 inhabitants per annum. 

Another extension of the same use is found when quan* 
tities are reduced to rates “ per head ” of the population. This 
use is solely for comparison, and the principle employed is 
that of the common denominator. It would be futile to state 
that the amount spent on drink was, say, £100,000,000 in 



elements of statistics 


84 • 


i860 an<£ £110,000,000 in 1890; but the corresponding state- 
ments that the amounts were £3 10s. per head in i860 and 
£2 15 s. per head in 1890 would make a comparison possible. 
In preparing any comparative summary of figures, it is always 
necessary to consider whether such an average should be taken. 
Preliminary So far, the averages considered are simply 
de&nition. arithmetical, and satisfy the following definition : — 

A vtrage X number to which it applies=ioial quantity dealt with.' 
e.g. Average weight X number of crew = total weight of crew. 

The following question, however, will lead us further. The 
to inippiica- average weekly agricultural wages in ,1892 in 
wuty. Wilts, Dorset, Devon, Cornwall, and Somerset 
were 10s., 10s., 13s. 6 d ., 14s., 11s. respectively. What was 
the average in the south-west of England? 

The simplest method is to say, the average was 


ios.+ios.+I3s. 6 i.+i 4 s.+ns._ 
5 ~ 


58s. 6 d. 
5 


=iis. 8- 


4 d. 


and for many purposes this would be sufficient ; but it does 
not satisfy the above definition. For when we ask the double 
question " ns. 8*44.* multiplied by what number equals what 
total ? ”, we can only answer that ns. 8-44. multiplied by the 
number of items equals the sum of items. 

We must consider further what we understand by the 
expressions “ average wage in each county,” and " average 
wage in the group of five counties.” 

It may be supposed that the average, wage in Wilts, for 
instance, was compiled by getting returns from different 
villages, say 12s., 11s., 9s., 9s. 64 ., 10s. 64 ., 9s., 9s., addirfg 
them and dividing by the number of villages. This of course 
satisfies our definition no better than the former. What is 
to be understood by the average in each village? If our 
present definition is to be satisfied, it should be the total of 
the wages paid in the village divided by the number of workers. 
It is hardly necessary to say that this total is never found in 
such an investigation, and the average is given from observation 
of by guess-work, not by calculation. 

If, however, the village average was correct, and we had 
returns from all the villages in the county, we should find 
the county average as follows : — * 

u/xaoo+n/x I 5 Q+ 9 /X 300+9/6x130+ io/6X4<x>+yxioo+o/x«oo 

4 aoo+ijo+30o+xjo+«oo+aoo+aoo “V” » * 
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where tRe numbers in the denominator are the numbers of 
labourers in the respective villages. We should then have the 
same result as if we had had the wages of all the labourers 
ip the county put down on a sheet, added up, and divided by 
their number, and the average would satisfy the definition. 

It is clear that we can simplify this arithmetical work, 
for if we divide throughout by 50 we get the same result; 
this is as if we said there were 4, 3, 6 . . . labourers in the 
villages instead of 200, 150, .. . Thus we get the same 
result if we take numbers proportional to the total numbers 
of the labourers instead of the actual numbers. This plan 
has two* advantages : first, that though we do not know the 
numbers of labourers, we know numbers nearly proportional 
to them, viz., those included in the census returns under the 
general headings relating to agriculture; and secondly, we 
need not choose our numbers with absolute exactness; thus 
the numbers of labourers above given may be supposed to be 
round numbers substituted for 213, 145, 320 . . . ; and it will 
presently be seen that such differences hardly affect the 
average. We idealise the village, and suppose it to contain 
round numbers ; and then for the numerical work take simple 
numbers proportional to these. This is important as simplifying 
numerical work. 

Averages obtained for the county in this way do not abso- 
lutely satisfy our definition, but are very nearly equal to 
those that do. We can then proceed to take the average for 
the south-west of England on the same principles. 

A common case is when the data are given as so many 
instances in successive grades, as in columns 1 and 3 in the 
following table. To obtain the arithmetic average it is neces- 
sary to make some assumption as to the distribution of the 
instances within the grade. It can be shown „ , , 
that in ordinary cases, especially where the 
numbers tail off rapidly at both extremities, a high degree of 
accuracy is obtained by setting out the work as if the numbers 
'in eafih grade were concentrated at the middle point of that 
grade; in fact the average in each grade is generally nearor 
the centre of the group than is the middle point of that grade, 
but the resulting errors on either side of the centre tend to 
neutralise each other. ’The work is generally simplified by 
ta^ng the. breadth of the grade (five years ii^ the table) as 
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unit, anil measuring from an origin selected at thfe* middle 
point of a grade in which the entries are numerous (grade 
40-45 years, middle point 42 £ in the table). The average 
distance from this origin (obtained by dividing the total pf 
column 4 by the number of cases) shows tke distance of the 
average from the origin in terms of the unit, whence the 
average is readily calculated on the original scale. 


Ages of Married Men in England and Wales, 1911. 


Grade 

Years. 

Middle Points of 
Grade measured 
from Origin, 

42I Years ; 
Unit, 5 Years. 

Number of 
Married Men 
per 1000 . 

Product of 
Numbers in 
Cols, a and 3 . 

Cumulation. 

Limiting 

Age. 

Number above 
Age in Col. 3 . 

Col. X. 

Col. 2. 

Col. 3. 

Col. 4. 

Col. 5. 

Col. 6. 

15-^0 

~5 

O 

0 

15 

1,000 

20-25 

4 

33 

— 132 

20 

1,000 

*5-3° 

-3 

112 

-336 

25 

967 

30-35 

—2 

152 

— 304 

30 

855 

35-40 

— 1 

154 

-154 

35 

703 

40-45 

0 

136 

0 

40 

549 

45-50 

1 

118 

+ Il8 

45 

413 

50-55 

2 

96 

192 

50 

295 

55-60 

3 

74 

222 

55 

199 

60-65 

4 

54 

216 

60 

125® 

65-70 

5 

37 

185 

65 

7i 

70-75 

6 

21 

126 

70 

34 

75-80 

7 

9 

63 

75 

13 

80—85 

8 

3 

24 

80 

4 

85-90 

9 

1 

9 

85 

1 

90-95 

10 

0 

0 

90 

0 



1,000 

+ i,i55 






— 926 






229 


V 


22Q 

Average : 42* + — — of 5 = 43-645 years. 


Diagrams illustrating this table are given facing p. 1301 

B. Weighted Averages. — This discussion introduces and 
gives an Example of the very important statistical method 
known as "weighting the average.” We may illustrate it 
further from the same figures by considering what weights to 
Upply to get this average for South-West England. We may 
find the number of agricultural labourers in the counties 
and work out the average thus : — 5 20,000 + 10 *- * 30)000 ± — . 

0 20,000 -f 30,000 -f ; 

or we may argue that since we have no means of l^nowing^the 
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exact nasjibers of labourers we may as well arrange the»weights, 
according to the importance of the counties, say 20,000, 
30,000, etc., from some other point of view, and tike numbers 
representing such quantities as the amounts of wheat pro- 
duced, the area, | or the rate of increase of population. In 
this particular case these methods would be absurd, but in 
other problems the weights are not so obvious. Suppose, for 
example, that we are considering the attraction of London on 
the inhabitants of various counties ; that we are told that so 
many immigrants arrive from Essex, Norfolk, and Suffolk, 
and so many from Stafford and Worcester, and we are asked 
to compare the attractive power on the agricultural and manu- 
facturing counties. Should we weight the numbers given by 
the total numbers of inhabitants of the contributing counties, 
or by their distance from London, or by some quantity derived 
from these? 

The idea is made clearer by the mechanical analogy in 
which the word weight originated. Suppose a uniform weight- 
less rigid rod graduated in 100 equal divisions. Medical 
and equal weights hung at the 40th, 50th, 60th, ulu ‘“ rition - 
7ofch, and 80th divisions from one end; the rod will then 
balance at a point corresponding to the unweighted average, 
60 intervals from the same end. Now, suppose the equal 
weights replaced by weights of 7, I, 3, 2, 4 lbs. respectively, 
and the rod will balance at a point corresponding to the weighted 
average, 57-1 intervals from the same end. The further any 
particular mass is moved, or the heavier it is, the more the 
centre of gravity will be shifted ; and this clearly corresponds 
fo the influence we should wish the various wages to have in 
the statistical problem. The formula in use in Statics x — 

which corresponds to the arithmetic on the previous 

page, can also be used in Statistics. 

The discussion of the proper weights to be used in this and 
other averages has occupied a space in statistical literature out 
of all proportion to its significance, for it may be said at once 
that no great importance need be attached to the special 
choice of weights ; one of the most convenient facts of statis- 
tical theory is that, given certain conditions, the n. until <n«ct 
same result is obtained with sufficient closeness «» "**“*• 
whatever logical system of weights is applied. We must 
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postpone the complete mathematical analysis of thiy proposi- 
tion, but may offer immediately some algebraic formulae and 
arithmetical* illustrations. 


Write Wj, W, . . . W* lor the weights applied to n quantities M?, 
M t . . . M». I 

-TV .V • Vi. J w W l M 1 + tV,M,+ . . . 

Then the weighted average, M w , =* — — * . • 

WiT W|+ . . . 

Let m be the average of the M's, and let M 1 =W-f M*=f n-fm t .... 

Then nfh =* Mj + M t + . . . = nfH 4- tn x -f m % 4 * . . . , so that tn 1 -J- m t 
+ • • • —°- * 

Similarly if W is the average of the o/'s, and w lf etc., nfl)=W 1 4-W t 

4- . . . , and w 1 -j-w t + . . . =o. 

Then (Wj+W^ . . . )M#,=( 5 '+«'i)( m + m i) + (®+“' 1 )(m-f m*)4- . . . ; 
nW.MM,—ntofH+ffi(w l +w t + . . .)+®( m x+ w t+ • • •) 

; m i+w mt+ • • • }• 

The difference between the weighted average (M w ) and the unweighted 
average in depends therefore on the average of terms such as “j. 1 . m v 

The sum of the o/'s and the sum of the m’s is zero, and of the o/'s and of the 
m's many are negative and many positive. It is only when like signs are 
more commonly found in a pair of m and w than are unlike signs that the 
whole expression for the difference between M«, and fh becomes at all 
important. 

In the following table from the Wage Census (see page facing), m is 
24 s. 2d., M**«24s. yd., w** 94*°, and writing the weights to the 

nearest 100 and the wages in pence we have the following values, the trades 
being taken in the order of the table : — • 


ttf. 

m. 

wm. 

w. 

m. 


wm. 

4-228 

4-13 

+2,964 

+433 

+ 41 

+17,753 

4" 28 

— 12 

- 336 

+ 149 

“ 41 

— 

6,109 

- 24 

— IO 

+ 24° 

-fl86 

+ 36 

+ 

6,696 

— 26 

*-53 

+1.378 

“ 42 

+ 7 

— 

294 

- 66 

-58 

+3,828 

“ 32 

+ 4 

— 

128 

- 82 

- 8 

+ 656 

+ 323 

+ 19 

+ 

6,137 

“ 72 

-23 

+1,656 

+ 13 

4- 61 

+ 

793 

- 81 

4-29 

-2,349 

- 79 

+ 111 

f 

8,769 

- 83 

+ 3 

- 249 

“ 73 

4 - 1 

— 

73 

- 88 

+37 

-3,256 

“ 76 

+ 65 

— 

4,940 . 

- 67 

-48 

+ 3,216 

- 89 

+ 50 

— 

4.450 

- 91 

“36 

+ 3-276 

“ 9 i 

+ 75 

— 

6,825 

4 - 58 o 

“15 

— 8,709 

“ 77 

4* 28 

— 

2,156 

- 44 

—92 

+4,048 

“ 65 

+ 1 

— 

6 5 , 

- 64 

4.10 

— 64O 

— 10 

+ 1 

— 

10 

- 25 

“25 

+ 625 

- 76 

- 46 

+ 

3.496 

- 71 

-27 

+ i, 9 I 7 

— 62 

— 16 

+ 

992 

- 54 

* “ 4 

+ 216 

“ 83 

“ M 

+ 

1,162 

- 89 

-66 

+ 5,874 

- 72 

+ 12 

— 

864 


Sum of 21 positive products 4-66,923. 

Sum of 17 negative products— 50,2 13. 
Sum of the 38 products* 16,710=10^! 4 * 

M* =«m+-L of 16,710=245. 2d. 4- <f.i 

nw 38x94 

penny, as in the table. 


24 s. 6*68d.=*24$. yd. to nearest 


The table on the next page affords an example of this 
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Examples of the Smallness or the Change Introduced by 
Difference in Systems of Weighting. 


Frotr^he Wage Census, 1886. 


Trade. 

Average 

Wages 


(Men). 


Cotton Manufacture 

25 3 

Woollen n ... 

23 2 

Worsted and Stuff Manufacture - 

23 4 

Linen Manufacture • 

19 9 

Jute 1 // . 

19 4 

Hemp, &c., // • 

23 6 

Silk n ... 

22 3 

Carpet » ... 

26 7 

Hosiery » ... 

24 5 

Lace m ... 

27 3 

Smallwares » ... 

20 2 

Flock and Shoddy Manufacture - 

21 2 

Coal, Iron Ore, and Ironstone 


Mines 

22 11 

Metalliferous Mines 

16 6 

Shale Mines and Paraffin Oil Works 

25 0 

Slate Mines and Quarries 

22 1 

Granite Quarries and Works 

21 11 

S&ne Quarries .... 

23 10 

China, Clay, &c., Works 

18 8 

Police 

27 7 

Roads, Pavements, and Sewers - 

20 9 

Gasworks 

27 2 

Waterworks .... 

24 9 

Pig Iron (Blast Furnaces) 

24 6 

General Engineering Iron and 


Brass Foundries and Machinery 


Trades 

25 9 

Shipbuilding, Iron and Steel 

29 3 

Tinplate Works - 

33 5 

Saw Mills 

24 3 

Brass Works and Metal Wares 

29 7 

Shipbuilding, Wood - 

28 4 

Cooperage Works - 

30 5 

Ooach and Carriage Building 

26 6 

Boot and Shoe Making 

24 3 

Breweries 

24 3 

Distilleries - 

20 4 

Brick and Tile, &c. , Making 

22 10 

Chemical Manure Works 

23 0 

Railway Carriage and Wagon 


Building • 

25 2 

Averages 


• 



Arbitrary 


in Returns Unit x.000 
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principll,* and is worth careful study. At the commehcement 
Example from % of the Wage Census, circulars were sent to all the 
tha Wife cenaus. principal firms in all well-located trades, asking 
for details as to wages. Of these some were not returne/L 
and the numbers allotted in the Final Report to each trade 
are not the numbers which actually belong to the trade in the 
whole country, but the numbers of those in the firms which 
made returns. The average wage given is not therefore the 
arithmetic average for these trades for the whole country 
corresponding to the definition given above for average, but 
the average of the average wages as returned in each trade 
weighted by the numbers for whom returns were iftade; so 
that the average wage given for the whole group of trades 
might have proved to be different, if with the same average in 
each trade the returns had been complete. It is very unlikely, 
however, that there would have been any great difference. 
In the table several systems of weighting are used; the first 
are the numbers in these returns, giving an average, 24s. yd . ; 
the second are the numbers belonging to each trade according 
to the census when they fire above a certain minimum, giving 
an average 25s. 3 d . ; the third is a purely arbitrary li 3 c of 
figures taken from a source which has no connection with 
wages, and the average is 24s. 5j<f. ; the last is the unweighted 
average, that is, all the weights are equal, and the average is 
now 24s. 2 d. These averages are close together, while the 
original items vary from 16s. 6 d. to 30s. 5 d. It is to be noticed 
that the true weights are not known in this case, but that 
owing to this principle we are able to dispense with them 
entirely. 

The problem dealt with in the next table is to find the 
average weekly agricultural wage in England and Wales from 
the returns for Michaelmas 1869 and Lady Day 
1870, given in columns 1 and 2. There are very 
many many different ways of taking this average, some 
0 ‘ of which are as follows : — Take the average of 

summer and autumn for each county, as in column 3, and then 
‘ihe unweighted average of these 45 numbers ; this is 12s. yd. 
Suppose the summer wage to be paid twice as long as the 
autumn wage, as in column 4, an<J proceed as before; the 


• From tl^e Statistical Journal, December 1897, with corrections. ** 
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average* is 12 s. 5 \d., the slight difference being du& to the 
inclusion of harvest payments in the Michaelmas wage, which 
makes them higher on the whole than the summer wages. 
Again, divide the counties into geographical groups, take the 
simple average foil each group (the figures marked a in column 3 
and b in column 4) and weight these by the figures marked c 
in column 5, the numbers of agricultural labourers in each 
group ; the average of the a figures with the c weights is 12s. 5 d., 
of the b figures with the c weights is 12s. 4 d. Again, weight 
the figures for each county in column 4 with the numbers in 
column 5, the most obvious method of all ; the average is then 
12s. 4 d. * Again, take the simple average of the district averages 
a and b, that is, give each of the eight districts equal weights ; 
the averages are 12s. 4 \d. and 12s. ^\d. Or take the simple 
average of column 3, counting Yorkshire and Wales each as 
one county; it is 12s. 8d. 

To obtain new groups, take as weights not the number of 
agricultural labourers, but the total population of the districts, 
the numbers marked d. Exclude the population of London as 
exerting a preponderating influence unconnected with agri- 
culture. A new factor is now introduced, for population is 
greatest in the manufacturing districts, where agricultural 
labour is of comparatively little importance, but receives high 
wages ; these high wages have undue weight, and the average 
of the figures b with weights d is brought up to 13s. 1 \d. If 
column 4 is rewritten correct only to the nearest is., and 
column *5 to the nearest 10,000, the weighted average is 12 s. 5 d. 
If column 3 is weighted with random numbers quite uncon- 
nected with the problem, viz., the successive digits in the third 
decimal places of the logarithms of the numbers 2 to 46, the 
average is 12s. io\d. The reader may try any other system 
o? logical or absurd weights, and he will find that unless there 
is some bias in the selection of weights, or great preponderance 
is given to a few counties, that the average will be little affected. 

Since the true system of weights which would reduce the 
'general average to our definition must be allied to some of 
those here adopted, and can hardly show greater divergency 
from 12s. 4 d. than these do, we may feel confident that the 
true average is within, £ay, 3 d. of this figure. The original 
items varied from 8s. 6 d. to 19s. ; the averages, even those 
* bated on the most extravagant methods, are contained by the 
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limits 123 . and 13s. 1 \d. Without some such argument as 
this we should have no clue to the magnitude <)i the error 
introduced by erroneous weights. It is never safe, however, 
tq assume that weights can be neglected, and an unweighted 
average used, without first examining the group in question, 
trying various systems, and seeing that the WelfhtI cannot 
resulting average is stable. This will only be in general* 
thacase if there is no connection bet ween, the size 
of the quantities and the true magnitude of the th ' ir “Unutton. 
weights. Thus if we are dealing with wages in towns, and are 
calculating the average for all towns taken together, we shall 
obtain to'o small a result if we ignore weights and count all 
towns as equal, for the higher wages are paid in the larger 
towns. Thus, as on pp. 118-9 below, the average of the 
recognised wages of 117 branches of the Amalgamated Society 
of Engineers was 32s. 4 d. in 1891 if we count all the branches 
as equal ; but was up to 33s. 4 d. if we weight the wage at each 
of the branches with the number of members belonging to it. 
But, though we cannot neglect weights entirely in such cases, 
we need to make only a very rough estimation for them if 
therb is no preponderating influence exerted by a small minority 
of places. In this case London, with a wage higher than any 
other district, except Dartford and Enfield Lock, and with 
nearly one-sixth of the total number of members dealt with, 
exerts such an influence. If, giving London its due impor- 
tance, we take as weights the numbers belonging to the branches 
to the nearest hundred, we obtain the average 33s. 6 d., prac- 
tically the same as before. Each group for which an average 
is*to be calculated must be treated on its merits; in many 
cases the weights may be neglected entirely; in nearly all 
ca^es, where the group consists of many items, even moderately 
large errors in computing weights may be neglected. Exami- 
nation of the data will generally determine the importance of 
such errors. 

This principle is of great importance. In many cases the 
true weights are incalculable or even undefinable ; but now it 
is seen that, given certain conditions, there is no need to cal-# 
culate or define the weights ; in many other cases the weights 
cannot be known exactly, # but exactness is not necessary. No 
system of weights, however, can remove an original bias 
to*wnon to all the items. If, for example, wage^ throughout 
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were is/less than here reckoned, the calculated average would 
be is. too high. So we arrive at a very important precept : 
in calculating averages give all care to making the items free 
from bias, and do not strain after exactness in weighting. , 

, ( 

C. Statistical Coefficients. — A statistical coefficient is a 
number, whole or fractional, by which a total (e.g., population) 
must be multiplied to give an allied number (e.g., number* of 
births). Thus if the birth-rate is 28 per 1000, the coefficient 
is *028. These coefficients play an important part in ordinary 
statistics and a very interesting role in the application of the 
law of error to demography. The population may h increase 
or diminish, but the coefficients relating to certain numbers 
fluctuate within narrow limits and only after a considerable 
period show any significant change in normal times ; and by 
their use the statistics of different countries can be compared, 
and numbers for future years can be forecasted in some cases 
with marvellous accuracy, subject only to the chance of some 
great catastrophe. Coefficients can be formed for births (in 
various districts), for deaths (according to age, profession, or 
disease), for marriages (at various ages), for suicides, crimes, 
accidents, consumption of various commodities; if the pre- 
liminary data could be obtained, for the number of persons 
crossing Westminster Bridge in the year, the number of visitors 
to the Monument, the number of umbrellas left in the train, 
and so on ; the list could be prolonged indefinitely. The more 
important coefficients are calculated for most civilised aountries 
and published in statistical reports. A knowledge of them is 
necessary for statistical investigations. 

It is clear that such coefficients are essentially only a special 
way of writing a certain class of arithmetic averages, and with 
reference to them we may discuss more generally the relation- 
ship between the terms used on p. 84. 

Average (A) x number to which it applies (N) = total 
quantity (Q) dealt with, 

or A = |, Q = N x A, 

Thus in the case of births A is the coefficient, N the popu- 
lation, Q the number of births. 

So far as is practicable, a movement of Q should reflect 
change in only one factor. If N is the whole population,., Q 
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will be affected by changes in the sex and age distribution of 
the population, and by the number of marriages and age at 
marriage, as well as by fecundity. Methods of securing strict 
comparability between the denominators in the cases of birth, 
marriage and deevth rat$s (by means of correcting factors *) 
are in common use. When these methods are not applicable 
we may fall back on the rule given by Bertillon ( Cours itemen- 
taite , pp. 94 seq.) t effects (Q) should be compared with their 
immediately productive causes (N) ; thus in the case of mar- 
riages, the question should be put " what persons are capable 
of marrying? ” and the answer is adult bachelors or spinsters 
or widowers or widows, and the total of these groups gives N. 
The rule may be extended to include persons or things indirectly 
concerned or affected; thus the output of coal may be con- 
sidered in relation to coal-hewers (the immediate producers) 
or to all employed at coal-mines, and the output of domestic 
coal in relation to the number of private consumers.! To 
eliminate all factors but one, the entries in the numerator 
should be homogeneous, the entries in the denominator should 
be homogeneous, and the potential relation of a person or 
thirfg included in the denominator to one in the numerator 
should be uniform. For example, the average value of exports 
per head of the population satisfies none of these conditions ; 
exports make a heterogeneous mass, the population consists 
of both sexes and all ages, and only part of the productive 
power of the nation is directed to the foreign market. 

The crude coefficients and averages, however, have their 
use ; if they change, some factor or factors have changed, and 
if* it is known that all but one are nearly constant, the coeffi- 
cients, move with an identified factor. Thus if N is the 
population, n the number of marriageable persons, and M 
tfie number of marriages, the crude coefficient is given by 

« M M n . £ n . . . n M 

C = Tf = — x tTt ; ■ if vt is constant C vanes with — , the more 
N n N N tt 

logical coefficient. 

D. The Mode. — We pass to the consideration of two other 
means in common use among statisticians but unfortunately 

* See Elementary Manual of Statistics (by the present author), pp. 105-7, 
and Statistical Journal , 1906, pp. 34-147. 

t See a discussion on homogeneity, comparability and relativity, Statistical 
*J 9 H>nal, 1908,1pp. 463-8. 
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not yet consciously introduced into common parlance/ There 
are, however, some popular phrases which, if they have any 
definite meaning, very nearly resemble the averages in ques- 
The average tion. When we hear of the average clerk, the 
man * average working-man, the phrases admit many 
interpretations. In some way these persons are supposed to 
be types of their kind. The average clerk may be supposed 
to mean the one who receives the average income of all clerks, 
whose expenditure on necessaries and on luxuries is the 
average of all of his class, who takes the average amount of 
interest in his work, if of average ability and average age. It 
will be seen that this clerk is ideal, and not to be found in any 
random assembly of half-a-dozen ; for each of these will have 
some peculiarity, some quality in which he differs from the 
average; the average man of the newspapers does not exist 
in the flesh, but is an imaginary person to whom certain 
attributes are attached. 

Quetelet's average man is familiar ; * he is of average height, 
weight, strength, girth and lung capacity, with eyes of normal 

Queteiet's range and medium tint ; but he is a more satis- 
•▼erage man. f ac tory model than the newspapers* average, Tor 
in regarding him we see the type from which all other men 
may be supposed to have deviated ; the creature that would 
have been produced if all disturbing causes were removed. 
That any actual person should answer exactly to all these 
standards is of course in the highest degree improbable. 

Quetelet refers neither to the arithmetic average,' nor to 
the median or the mode (defined in the sequel), but to a mean 
about which all the similar measurements are grouped in 
accordance with a definite law, the obedience of anthropo- 
metrical measurements to which was his chief theme. 

The newspaper average, on the other hand, seems to be 
the mode, the position of the greatest density, which may be 
explained as follows ; — Referring back to the table 

The mode. * . ° _ _ 

of American wages, p. 09, or to the table on next 
page, it will be noticed that in looking down column 2 we‘ find 
the numbers increase till we come to 685 (between $1.15 and 
$1.24), and then after fluctuations diminish. This number, 
685, is the greatest which occurs in apy 10-cent group. 

• See Quetelet *8 Physiqus Socials ; and Edgeworth in Statistical Journal , 
December 1893. , *4**' 
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Determination of the Mode. 


Numbers of Wage- Earners from the Senate Report , 1893, U.S.A 
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The value of the graded quantity in a statistical* &roup (of 
wages, heights or some other measurable quantity) at which 
the numbers registered are most numerous is called the 
mode , or the position of greatest density , or the predominant 
value . In the case of a group that is i represented by a 
continuous curve the value is the abscissa of the maximum 
ordinate. 

In this column 2 we have, however, 14 maxima in *the 
correct sense of the word, the numbers rise and fall with little 

Method 0! regularity, and there are 14 modes of which that at 

determining $i. 15-$ 1.24 iS the most pronounced. But if the 
groups are made wider, and the numbers entered 
as in column 6 in half-dollar limits, there are only three modes, 
or if we neglect the small group of 8 at $5.00 only two. The 
position of the largest group of 1472 is not at once assignable 
more closely than as between .75 and 1.25. 

A further method of approximating to the mode may be 
illustrated as follows : — When the numbers are tabulated in 
io-cent groups, as on p. 97, the mode is quite indeterminate ; 
in 20-cent groups the successive numbers beginning at .25-44 
are 16, 144, 270, 370, 989, 557, 538, 531, etc., and the number 
989 (in the group $1.0541.24) is a distinct mode; if we begin 
the 20-cent groups at .35-54, the numbers are 74, 242, 282, 
505, 784, 924, 274, etc., and 924 (in the group $1.35-11.54) is a 
mode; by this double tabulation it is seen that the 20-cent 
grouping does not decide the mode. In 30-cent groups we 
have 355, 674, 1242 ($1.15-11.44), 740, etc., if we begin with 
$.55-^84; we have 439, 1190 ($.95-41.24), 1023, etc., if we 
begin with $ .65-$ .94 ; and 483, 1088 (S1.05-S1.34), 996, etc!, if 
we begin with $.7541.04 : the mode by each of these groupings 
lies in a group which contains $1.15 to $1.24, and this smaller 
group may be assumed to contain the mode, which is thus* at 
or near $1.20. The example here taken is drawn from a group 
of very irregular figures, which specially illustrate the diffi- 
culties. The method just adopted may be summarised thus : — 
Tabulate the figures again and again in gradually widening 
.groups till regularity is obtained; then examine again the 
groups which have the selected width and see if the mode is 
shifted when the lower limit of the grouping is moved ; if it 
is shifted the groups are not wide enough ; if it is not, the mode 
is in the smallest group common to the larger equal 
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which all 


contain it. A diagrammatic method is described 
on p. 138. . 

Even when our numbers are initially regular, it is seldom 
easy to determine the mode exactly. The diffi- 
culty is best seen by art example. Suppose that 


wing returns as to 
men : — 

heights of 

67 in. 

455 

67 i .. - - 

475 

67 £ .. 

490 

67I .. - - 

500 

68 „ - - 

485 

68 £ „ - - 

467 

68 £ „ - - 

445 


Indefiniteness 
of the position 
of the mode. 


At first sight the mode appears to be at 67 J in. exactly ; but it 
must be remembered that even in accurate measurements all 
heights within £ in. of 67 J in. will be entered as 67J if the 
measurements are taken to the nearest quarter inch, or will 
have been tabulated in this way if the measurements were 
mo^ accurate. Hence 67J in. in reality stands for from 67I to 
67 J in. If the 500 heights so entered were distributed uniformly 
through this interval, the mode might be given with 67! in. 
with fair accuracy ; but there are signs in the figures that the 
mode is below this. Suppose that the figures in reality come 
from the following measurements : — 


From 67^ to 67! in. 

„ 6 >7f .. 67 £ „ 

.. 67J „ 67! „ 

67I „ 67J „ 
.. 67f „ 67} „ 

„ 67J „ 68 „ 

68 „ 68J „ 


245 } 483 at6 7 t in - 
250 } 495 at 6 7 f „ 

243 } 493 at 67j " 
242 


and that these had been tabulated as in the last celumn, the 
mode would appear as 67I in. ; while the same figures tabu- 
lated 5 s before gave it as 67! in. The probability of some such 
shifting is seen from the original grouping, where the number a<« 
67 J in. is greater than that at 68 in. From this discussion we 
may see that the mode is always a little indefinite, depending 
on the width of the groups in which the items are tabulated, 
on the «xact position of the limits of the groups. As the 

• * H 2 * 
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items we deal with become more numerous, we shall find 
regularity when they are tabulated in narrower groups, and 
the mode can be assigned with greater accuracy. 

A mathematical method (p. 228) suggests that the mode of. 
such a group as given by the heights can Ire determined by 
dividing the interval containing the mode (67! to 67J in.) in 
proportion to the differences between the numbers registered 
in this interval to the numbers in the adjacent interval, via. : 
500-490 : 500-485 = 10 : 15. The mode so computed is at 

^67 § + ] r ( ~p I 2 °f i) in. = 67!^ in. By this method if two 

intervals contained the same numbers the mode would be 
placed at the value dividing them, and if the numbers on 
either side of that containing the greatest number were equal 
(if the grouping was symmetrical) the mode would be placed 
at the centre of the middle interval, in both cases as we 
should have determined a priori. 

Now is the “ average workman ” the man who earns $1.73 
per diem, the simple average of the whole group on p. 69, or a 
The •• «ere*e man making $1.20 the mode ? In ordinary speech 
the latter is meant. The “ average clerk ” is*hot 
the one whose measurable qualities are an arithmetic mean of 
all similar qualities, but one whose qualities are found in the 
same degree in the greatest number of his fellows. There are 
more clerks who read the evening paper than who read Homer, 
more who go to music-halls than to oratorios, more whose 
incomes are £100 than £500, more who live four mites from 
the City than one or twenty. Even with this explanation the 
average man is not a real creature, for fortunately no individual 
has no qualities out of the common. The fact that the average 
is a pure abstraction is of importance directly we apply statis- 
tics to actual affairs; these American workpeople cannot be 
legislated for in the mass as if they all earned $1.20, or as if 
those who* were alike in this did not differ in other respects, 
even doing very varying quantities of work for this wage. No 
single measurement expresses completely even the economic 
*■ importune* oi condition of a group of workmen, but if we are 

them*!*, taking a single measurement, that of the " mode ” 
is often the most useful. It is at tl\e mode that we find the 
greatest number of whose greatest good we may be thinking. 
Whereas the, arithmetic mean and the " median ’’ (defined 
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below) % Aiay correspond to no reality but be merely numerical 
conceptions, the mode is precisely that number for which most 
instances can be found. It shows the commonest result, that 
post often obtained, and is of very general application. For 
an intending pa&enger.by train or 'bus, it is more important 
to know the most ordinary than to know the average number 
in a compartment. The mode rather than the average in 
cfcest measurements is the number most suitable for the 
ready-made clothier. For providing a post-office or a store, 
the mode in postal orders or prices of tea needs to be known 
rather than any other average. Even the favourite coin in a 
collection may show the spirit of the congregation better 
than the arithmetic average of their contributions. In these 
last instances it may be noticed that the mode is quite definite. 

A special feature of the mode is that it is entirely unin- 
fluenced by extremes. A cheque for £1000 in a collection 
disturbs the arithmetic average, but not the AdTanu £ <*of 
mode. The incomes of a small number of mil- the mod*, 
lionaires and an army of paupers may have the same arithmetic 
average as a nation composed entirely of people moderately 
well off ; but the modes will be very different in the two cases. 
In considering the change year by year in a group of figures, 
as for instance, the wages of a large group of workmen, we 
cannot tell, if we take the arithmetic average as our criterion, 
whether an improvement is due to a levelling up of the badly 
paid or a rapid increase for those who were already well off, 
while tjie mode will show the changing position of the main 
body. Mr. Booth's London is crowded with instances of 
ttie use of the mode. Each age diagram shows the mode in 
ages for an occupation; each wage list that in wages. His 
whole description of Class E, the typical workmen of modem 
fowns, is based on the same principle. His measurement of 
social status, based on the number of rooms occupied or servants 
employed, can be used easily for stating the mode ^four rooms 
to a family and no servant) but not any other average. 

An objection to this average is that there are many groups 
of figures to which it is not applicable. If we have a very 
irregular group of numbers with no particular shortcoming oi 
type, such as the populations of towns in Eng- the mode, 
land, the mode would tfe quite indefinite, and would give no 
rf^formatioij of importance. The use of the mode is to indicate 
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the type'from which other figures may be regarded as diverging 
Thus, in these wage figures, the type is about $1.20, and other 
examples lie on either side, wages of men who have for some 
reason or other more or less than the normal degree of skijl 
or opportunity. If there is a type, as in Qii&telet's instances, 
the mode will show it. The mode only tells us one fact, 
however, about each type, and it is necessary to supplement 
it with other measurements. 

E. The Median. — When we are dealing with a group of 
persons or things, each of which possesses some measurable 
attribute, such as height or wage, we can choose certain quanti- 
ties which describe the group in brief. Suppose all the items 
arranged in a series in ascending order of the magnitude of 
this attribute ; the magnitude appertaining to the item half- 
way up the series is called the median .* Thus if in a group 
of wage-earners 200 earn less than 20s. 3^., one earns 20s. 3 d., 
and 200 more, 20s. 3 d. is the median wage. There are as 
many items below 20s. 3 d. in the supposed series as above it. 
The magnitudes one-quarter and three-quarters up the series 
are called the quartiles ; * those one, two . . . nine-tenth* up 
are the deciles; those one, two . . . ninety-nine hundredths 
up are the percentiles. The median is more definite in position 
than the mode. When we are dealing with exact measure- 
ments, if we have an odd number of items it is the middle one, 
if an even number, it lies between the two middle items, 
which are in general near together, or coincides withjthem if 
they are equal. If the magnitudes are not given exactly, 
but as within small limits, we can by the method described on 
pp. 106-7 make a good estimate of their actual values. The 
median is not affected by exceptional entries at all ; the exist- 
ence of any number of millionaires has no more effect on the 
median income than of an equal number of any other persons 
whose incomes are above the median. For many purposes 
it is of course necessary to allow these extreme instances more 
weight than those which are nearer the average; but the, 
arithmetic average often gives them undue weight for this 
democratic age, since a single millionaire can counterbalance 
thousands of ordinary working men. A further advantage is 
that it is extremely simple to find, not needing much arith- 

• These quantities have already been used in tabulation p. 70 
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metical work, for we need not do more than count those well 
above and well below the average, and look mqre carefully 
at those near it. 

• There is a yet more important advantage in the use of tiie 
median ; it can often be* found exactly, when our information 
as to the items in question is neither accurate nor Noneed f or 
complete. This will be clear from one or two complete infor- 
e^amples. It may be that in a " wage census” mAW 
100,000 persons, whose wages were far below the average, 
do not come into the returns at all, and it is very difficult 
to estimate their effect on the arithmetic average for want 
of information as to their earnings ; but to find the median 
exactly, we need only know their number, not their earnings ; 
and if we can only assign a maximum for their number, we still 
can place the median within narrow limits. The addition of 
100,000 men with wages below 15s. to a general summary for 
the 356,000 men on p. 470 of the General Report on Wages in 
1886 (C.— 6889), would still leave the median in the group 20s. 
to 25s. where it already is ; the change would be very marked, 
however, in the lower deciles and quartiles, and the arithmetic 
average would be lowered by at least 2 s. 1^. The same argu- 
ment applies to incomes ; information is often very deficient, 
but it is in many cases possible to assert that a number of 
men, whose exact income is unknown, receive above a certain 
assigned sum, or even between two assigned limits, which is all 
we need to know about them to determine the median, if it 
lies bel#w the lower limit. 

Again, in tracing the history of wages throughout the 
century it is often very difficult to find the correct average, 
but at the same time it is frequently possible to say that a 
very large class of men earned below, say, 15s. a week, and 
another very large class above 30s. whose wages we do not 
exactly know, and a more definite number between 15s. and 
20s., and 25s. and 30s. ; and in order to find the median all 
we need to do is to investigate more exactly the wages between 
*2os. Snd 25s., if that is the grade which contains it ; and even 
if we have not complete information here, we can still sa^; 
that the median certainly lies between certain narrow limits. 
There is yet another advantage, perhaps more important, that 
the median is applicable to quantities which are incommcnsur- 

capable* of measurement at all. This develop- able 
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ment is especially due to Galton.* Suppose it to be required, 
for example, to find among a large class of boys the average 
in intelligence. It is clear that it is not easy to find the arith- 
metic average of a quantity which cannot be properly measured 
even by the most elaborate system of marks, but on the other 
hand it would not be at all difficult with a class of, say, twenty 
boys, to place them in order of intelligence without committing 
oneself to such a statement as that *A/s cleverness was 
25 per cent, more than B.’s; and the tenth or eleventh boy 
in this arrangement would show the style of boys in the class, 
at least as well as any other average. The disadvantage of 
this method, the reason why it is not universally applicable, 
is that the median of a series of observations 

Disadvantages. . . 

may be totally removed from its type, and in 
fact may not be situated near any of the different objects 
which are observed. Thus, if we had two large groups of 
wages of a thousand men between 155. and 25s., and another 
thousand between 35s. and 45s., the median would give us 
any position between 25s. and 35s., where as a matter of fact 
not a single wage-earner would be found. The median is 
then chiefly useful when we are dealing with a series of objects 
of which the main part lie fairly close together ; a few extremes 
do not affect it.| 

If m is the median and a the arithmetic average of n quantities x lt x t . , . 
* n , and we call x x — s y x % — x . . . the deviations of the x’s from any quantity jr, 
then m is the value of * which makes the sum of the deviations # (all taken 
positively) a minimum, a is the value which makes the sum of the squares a 
minimum. The first statement becomes obvious from the following analogy : 
suppose 2n + i places in a straight line are each served by a single wire frhm 
a telephone exchange at the « th place from one end; the lengths of the 
wires correspond to the deviations; now if the exchange is moved to the 
tt 4- i th (or central place), n+1 wires are shortened and n wires lengthened 
each by the same distance, so that the aggregate of wire is diminished * 13 if 
the number of places is eve n, the minimum is obtained at any position at 
or between the n th and n-f i lh from either end. For the second, we notice 
that 2*— ft?, and that 2{x— z) t ~'Zx i -na t ~{-n(a— i)*, which is a minimum 
when z—a. 

The following table shows the description of 76 items by 
the help of the various averages now described : — 

* See, for instance, Natural Inheritance , pi 47. 

| On the relative advantages of this, and a more mathematical method, 
see Yule and Gplton in the Statistical Journal for 1896, especially pp. 392-*&~ 
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Measurements of Boys of Ages 13 to 15 Years. 


No. 

Age. 

Height. 

> 

Weight 

No. 

Age. 

Height. 

Weight. 

Tabulation of 
Weights. 

I 

> rs. mth. 

14. I 

ft. In. 

4.114 

*t. lb. * 

6.0$ 

39 

yn. mth. 

14.7 

ft In. 
4.II* 

•t. lb. 

63J 

Arithmetic aver* 

2 

I4.9 

4. IO 

5-7 

40 

13-1 

4 - iij 

57 

age, 6 st. 1* lbs. 

3 

14.7 

5 - 5 * 

7.5 

4 i 

14-3 

4 . II 

6-41 


. 4 

I 3 -II 

5 -o 

6 - 3 * 

42 

13.3 

4.4* 

4 -Ml 

The same, when 

5 

14 . II 

5 - 3 l 

8.0* 

43 

14*3 

5-3 

6.7* 

weights are en- 

6 

14.7 

4 - IO 

5 -o 

44 

13.6 

5.1* 

613! 

tered only to 

7 

14.3 

4. 10 

6.7 

45 

14.2 

4-83 

6.0$ 

nearest stone, 

8 

i i -9 

5-5 

8.51 

46 

13-5 

5-2 

7-4 

6 st. 1* lbs. 

9 

14. 11 

4-94 

S - I2 1 

47 

13-8 

5 - 2 * 

6. 11 


10 

M 3 

4-iiJ 

6.11$ 

48 

14.6 

5*4 

7 - 4 * 

Median, 6 stones 

11 

134 

4-7 

5 -i* 

49 

14.8 

5 -i* 

6. 10 

i* lbs. 

12 

14.7 

5 - 3 ft 

7-8* 

50 

13-3 

4.81 

50 


13 

13.8 

4-72 

5*3 

5 i 

130 

51* 

6.7 

Quartiles, 6 st. gl 

14 

M .5 

5 - 2 $ 

7.8$ 

52 

13.10 

4 -lli 

7 - 3 * 

lbs., 5 st. 6* lbs. 

15 

14.4 

5 -o 

6.0 

53 

14.8 

4' ”4 

6-91 


16 

13.6 

4.9 

5-6 

54 

13-8 

4-52 

4-91 

Average of quar* 

17 

14.0 

5 - 2 * 

7 - 7 $ 

55 

14.8 

5 - 4 * 

7.0 

tiles, 6 st. I lb. 

18 

13.0 

4 - 8 * 

53 

56 

14.0 

4. 10 

6.2* 


*9 

14.7 

4. u 

6. 12$ 

57 

13.10 

4 9 

5-5 

Half of the ex- 

20 ; 

14. 10 

5 * 1 

6.9 

58 

132 

5 °i 

6.4 

amples lie within 

21 

13-9 

4.11 

5.11 

59 

13-6 

4 7 

5.2* 

9 lbs. of median 

22 ! 

14. 10 

4 - 8 * 

5. II 

60 

13.0 

4-9 

5-92 


23 

13-4 

4.94 

5 -Sf 

61 

13-3 

4 - 8 $ 

5 - 5 * 

Mode is between 


13.1 

5 - 2 i 

6. i 

62 

* 3-5 

4 . 8 * 

6-52 

6 st. and 6* st. 

25 

14.0 

4 - 6 * 

5.6* 

63 

13.10 

5 * 5 * 

7.10* 


26 

14.6 

5-34 

7.6* 

64 

I 3 « I 

4.8$ 

6.2* 

Average weight 

27 

M 3 

5 -°* 

5.11$ 

65 

13.10 

5-4 

7*2 

between ages 13 

28 

13-9 

4.9 

5.11 

66 

14.0 

4-9 

5 °i 

and 13* years, 

29 

13-4 

5 - 1 * 

5-9 

67 

13-3 

47 

5 -o 

5 st. 9* lbs. ; 13* 

30 

14.4 

5-1 

6.8* 

68 

13.8 

4. 1 1 

6.1* 

and 14 years, 5 st 

31 

14. 10 

4 - 9 * 

4 - 7 * 

69 

13-7 

4.11$ 

6 - 4 * 

13* lbs. ; 14 and 

32 

13.2 

4 - 9 * 

5 - 13 * 

70 

13. 1 1 

4-8 

4.4* 

14* years, 6 st. 3* 

33 

14. 1 

4 - 8 * 

5 - 8 * 

7 i 

13. u 

4.8 

4.4* 

lbs. ; 14* and 1 5 
years, 6 st. 8§ lbs. 

34 

ij.io 

5 - 2 * 

6.8* 

72 

13.2 

4 7 l 

4. 10 

35 

14.0 

4. 1 1* 

5-7 

73 

14.0 

4. 1 1 

6.5 


36 

14.4 

4. 11 

6-5 

74 

13.3 

4 . 3 * 

4.1* 

Heights may be 

37 

14.8 

4.11 

6.0$ 

75 

13-3 

5.0 

7.2$ 

tabulated in the 

38 

. 13-7 

50 * 

6.2 

76 

137 

4 - 8 * 

5-6 

same way. 


Heights arranged in order of magnitude (in.) — 

5i$. 52$, 53 !- 54$. 55, 55- 55, 55i. 55|, 56. 
56, 5 <>i, 56 $, 56$, 56$, 56$, 5 56 |, 5 &I; 
5^2, 57, 57, 57, 57, 57, 5 7$, 5 7$, 57$, 57$, 

58, 58, 58, 58, 59, 59, 59, 59- 59; 

59, 59, 59$, 59$, 59$, 59$, 59$, 59$, 592, 592. 

60, 60, 60, 6o£ ( 6o|, 6o|, 61, 61, 61J ; 

61k 61k 61k 6 ’ z , 62k 62k 62k 62k 62k 63, 
634. 632, 632, 64, 64, 64$, 65, 65I, 65$. . 
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A graphic method of finding the median of these heights 

Graphic ,closely is given by Mr. Galton in the Report of 

method. tfo Anthropometric Committee of the British 
Association, 1881, p. 247; and is illustrated by the diagram 
facing this page. 

On a horizontal line mark off equal intervals representing 
units of measurement, say inches. On a vertical scale mark 
off equal intervals representing the number of instances, 
i.e., persons whose heights are measured. Beginning at the 
lowest, 51^ in., on a vertical line mark as many dots at 
equal intervals on the vertical scale as there are persons at 
that height (in this case only one), so that each dot represents 
one person. From the highest dot thus marked, suppose a 
horizontal line drawn till it is over the next height division 
at which there is an instance, 52J in., and with this new 
base proceed as before, marking each instance at 52J in. 
by a dot vertically above the 52i-in. mark. Next draw a 
connected line through the middle points of the consecutive 
vertical rows of dots ; if there is an odd number of dots, the 
middle one is taken as the middle point ; if an even number, 
the middle point is half-way between the middle ones. 

On the vertical scale mark the positions of the median, 
quartiles, etc., obtained by dividing the distance representing 
the total number of instances into appropriate parts, and 
through these points draw horizontal lines to intersect the 
connected line already drawn. The points of intersection 
lie vertically above the heights required, as marked«on the 
horizontal scale. 

Now it may be assumed that the heights of all persons 
returned at, say, 58I in., are in reality evenly distributed 
between the limits 58I and 58J in., heights lying within 
which would be so returned ; and it can be verified that the 
construction just given shows the place of the median, deciles, 
etc., almost exactly on this hypothesis. 

The following analysis is only important when the number of instances 
is small, and the position of the quartiles, etc., is not evident. Thefre are * 
two cases, (i) where the observations are exact, (2) where the observations 
i.re given in grades or to the nearest scale mark. 

(1) The following 45 numbers are the numbers of minutes occupied by 
trains on a certain distance according to time-table : — 

45. 46, 47. 48, 48, 5*. 53, 54, 55, 58; 61, 61, 62, 65, 65, 69, 69, 69, 71 , 76; 
76, 76, 77, 77, 78, 80, 81, 81 , 82, 82; 83, 83, 84, 85, 85, 85, 85, 87, 88, 89; 90, 
92, 94, 1 01, 103. o 



GRAPHIC METHOD OF FINDING MEDIAN, QUARTILES AND 
DECILES (after Gal ton : Anthropometric Committee : ’Brit. Ass 11 .). 

For the Heights of the 76 boys, between ages of 13 and 15. 





51 IN. 52 63 54 55 56 57 58 69 60 61 62 63 64 65 66in. 


Median 59 $ inches. 

Quart iles 

Half inter-quartile distance 
2 * 2 . , 
Deciles 55 * 6 , 56 * 6 , 57. 57*9* 
- 63 - 6 , 62 *, 607 , 597 . 


Arithmetic average, 59 * 095 . 
Greatest density 57 or 59 . 
pf „ in Smoothed 
curve would be about 58 . 
Geometric average 58 * 98 . 
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The median Is the 23 rd instance, viz. 77 minutes. 

To find the quartiles we must divide the 45 numbers into 4 equal parts. 
Suppose that on such a scale as the vertical scale in the diagram, p. 106, 
the instances are entered at J, ij, . . . 44$, the distance 45 representing the 
whole space to be divided. The quartiles are at 1 ij and 33}. n J is between 
the ii 1 " instance (61^ at ioj and the 12 th instance (also 61) at 11$, and 
the lower quartile is $1. 33 £ comes between the 34 th and 35 th instances, 

both 85. If the entries were not equal we might take J of the nearer entry 
(the 34 th ) of the 35 th . 

Similarly the deciles are at the marks 4$, 9, 13} ... on the scale. 
The lowest is at the 5 th entry (48 min.), the next half-way between the 
9 th «and 10 th (56 J min.), and so on. 

The positions of the D's, Q's and M on the diagram are marked on this 
principle. 

We have the following scheme for the median and quartiles : — 


No. of 
Cases. 

** Median. 

Lower Quartile. 

Upper Quartile. 

4* 

$(2W th -f 2« + I th ) 

!(»»>+» + : I th ) 

i(3» ,h +3»+i th ) 

4« + i 

2 n + I th 

in ,h + i» + i th 

i 3 w + i th + l3w+^ 

4» + 3 

i(2f» + I h +2« + 2°' 1 ) 

«+? h 

3« + 2 nd 

4”+3 

2t. + 2 ml 

in -+l" + l«+2" d 

i3 n +- 2 nd -f- 13 « + 3 rd 


A similar scheme could be worked out for the deciles. 

(%) If the numbers are given in grades (whether as between, say, 53 and 
54 in., or as at 53 in. to the nearest J in., i.e., between 5 2J and 53^ in.), they 
may be regarded as spaced uniformly through the grade, and then the 
method of case (1) applied. 

The method can be illustrated from the ages of married men, column 6 
of the table, p. 86. Here 549 men are over 40 years, 413 over 45 years; 
the 500 th man is one of the 136 in the grade 40 to 45 years, in fact the 48 th 
or 49 th man in that grade. If they are uniformly distributed, the 49 th is 
at the 49 th of 136 equal intervals in which the 5 years may be divided. 

Hence th<j median is at 40 of 5=41-80 years. It is not worth 

while to try to place it more exactly. Similarly the lower quartile is the age 
where we find the 750 th man, somewhere in the grade 30-35 years, and 

may be taken as 30 4-—^— of 5 = 33*45 years, and the upper quartile at 

S° +25L % 25 ° ot 5 = 5 r 34- 

Simple graphic methods may readily be found for either case. 

F. Geometric Mean. — If a v a 2 . . . a n are n quantities 
G the geometric or logarithmic mean is given by 
G = . a 2 . . . a n , 

and log G = ? (log + log a, -f- . . . -f log a„). 

The geometric mean is always less than the arithmetic 
mean of the same quantities. 

This me 5m is appropriately used when emphasis is on the 
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ratio between two quantities rather than on their absolute 

difference. , If the difference between 8 and 13 is of the same 
importance as that between 13 and 18, then the mean of 
8 and 18 is properly taken to be 13, equidistant from either ; 
but if the ratio 8 to 12 is of the same importance as that of 
12 to 18, then the mean of 8 and 18 is properly taken as 
12 = s /8 x 18. 

We obtain an analogy as follows Of five quantities a v 
<*a> a &, let a x and a 2 be less than A (the arithmetic mean 

of all) and also less than G, and the others be greater. 

Then $A. — u 1 -f- a 2 -f- a 3 -f- a 4 -)- a 5 , 

and (A - a x ) + (A - « 2 ) = (a, - A) + (a 4 - A) + (a, - A) 
and G 5 = a ] xii ! X«)Xa l X a s , 



Thus in one case the sum of the excesses of the mean equals 
the sum of its defects ; in the other the product of the ratios 
of the mean to the quantities less than it equals the product 
of the ratios of the greater quantities to the mean. 

An important use of the mean is in connection with prices. 
A general rise of prices from 100 to 120 is exactly the same 
from many points of view as a rise from 120 to 144, and is 
greater than a rise from 120 to 140. This consideration may 
have led Jevons to use the geometric mean in his first treatment 
of index-numbers ( Fall in the Value of Gold). 

It should be noticed that the geometric mean gives greater 
importance to small numbers and less to large than does the 
arithmetic. 

G. General. — The function of means will now be clear; 
it is to express a complex group by a few simple numbers. 

The function of The mind cannot grasp the magnitudes of millions 
means. 0 f items at once ; they must be grouped, simpli- 
fied, averaged. The means chosen must be those which 
will give the striking features and the essential characteristics 
of the group. Different methods will apply to groups of 
various classes; each must be taken on its own merits. A 
good and suitable mean has the following characteristics : — 
If there is a type it shows it ; it gives due influence *o extreme 
cases ; it is not easily affected by errors or muc\ displaced mfiy 



AVERAGES 


IO9 

\ , . * 
slight alterations in systems of calculation ; and it is easily 

.calculated. , 

The relative positions of the different kinds of means 
do^lt with gives some information as to the general nature of 
the group to which they refer. The arithmetic mean, 
median and mode, are coincident, if the group is symmetrical. 
The arithmetic mean is probably above the median, if we 
haw a small group at a high degree. The arithmetic mean 
is generally below the median, if there is an absence of high 
numbers, and a concentration a little above the mean. 
The mode will be badly defined, if our group is not homo- 
geneous. 'The mode will probably be below the arithmetic 
mean, if there is a small group at a high degree. The mode 
is well marked, if the distribution is uniform. These rules are 
only tentative and easily nullified by exceptional circumstances. 



CHAPTER VI. 


MEASUREMENTS OF DISPERSION AND OF SKEWNESS. 

APPLICATION OF AVERAGES. 

Measurements of Dispersion and of Skewness. 

In the sections of Chapter V which relate to “ means ” we 
have been concerned principally with considering the central 
position of a statistical group, where by the term 

Statistical groups. r ° r * J 

statistical group we mean a number of persons or 
things possessing certain defined attributes (Enumerated, in 
England or Wales, in 1911, male) and grouped according to a 
variable attribute (age). We can exhibit such a group either 
by tabulation in grades or otherwise (pp. 69-70) or by a diagram 
(p. 127), but for purposes of brevity or for comparison with 
other groups we need to define and calculate measurements 
related to the group in such a way as to show its characteristics. 
For this purpose it is convenient to choose (i) a mean which 
locates a central position, (ii) a measurement of the dispersion, 
variation or scattering of the observations, and (iii) a measure- 
ment of imperfect symmetry. We proceed to the discussion 
of (ii) and (iii). 

The differences between the measurements of the items of 
the group and a mean or other fixed point are called deviations . 

, . In the table (p. iii) the group taken contains the 
death-rates of the aggregate of large towns in the 
52 weeks of the year 1902. These are arranged in order of 
magnitude down column 1 and up column 2. In columns 3 
and 4 are shown the deviations from the quantity 173, selected 
as being near the median, 1 72J. It was shown on p. 104 that 
the total and therefore the average of deviations (all taken 
’positively) is least when they are measured from the median; 
to obtain such deviations we must add $ to each entry in column 
3 and subtract $ from each entry in column 4, i.e. add and 
subtract 13 to or from the totals. The total of the positive 
• no 
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deviation's from the median is then 447, of negative is 388, and 
of the 52 deviations (irrespective of their sign) is 835. The 
average of these deviations, viz. 835 -f 52 = 16-06. To 


t)EATH-RATES Wl.'K BY WEEK IN 1902 IN THE AGGREGATE OF 

Great Towns in England and Wales. 


Weekly Death-rates 
per 10,000 living in 
Oraer of Magnitude. 

Col. x. Col. a 

«. A 

For Mean Deviation 
from Median. 

Col. 3. Col. 4. 

Excess of Excess of 
a over 173. 173 over b. 

For Standard 
Deviation. 

Col. 5. Col. 6. 

Squares of Differences 
from 173. 

For Mean Difference. 

Col. 7. Col. 8. Col. 9. 

Difference . 

between ™ulti- p roducL 
a and b. P ,ler * 

344 

136 

7* 

37 

5,°4* 

1,369 

108 

5* 

5,5o8 

233 

*39 

60 

34 

3,600 

1,156 

94 

49 

4,606 

226 

141 

53 

32 

2,809 

1,024 

85 

47 

3,995 

209 

*43 

36 

3° 

*,296 

900 

66 

45 

2,970 

206 

*44 

33 

2 2 * 

1,089 

841 

62 

43 

2,666 

201 

*45 

a8 

90 

784 

784 

56 

4* 

2,296 

196 

149 

*3 

24 

529 

576 

47 

39 

*,833 

196 

150 

23 

23 

529 

529 

46 

37 

1,702 

196 

*5* 

23 

22 

529 

484 

45 

35 

*.575 

191 

*5* 

18 

21 

324 

44* 

39 

33 

1,287 

*83 

*54 

10 

*9 

100 

361 

29 

3* 

899 

182 

*55 

9 

18 

81 

3 2 4 

27 

29 

783 

18a 

*59 

9 

*4 

81 

196 

2 3 

27 

621 

181 

160 

8 

*3 

64 

169 

21 

25 

525 

*79 

*64 

6 

9 

3<> 

81 

*5 

23 

34S 

*77 

165 

4 

8 

16 

<54 

12 

21 

252 

iy 

166 

4 

7 

16 

49 

IX 

*9 

209 

177 

166 

4 

7 

16 

49 

XX 

*7 

*87 

176 

167 

3 

6 

9 

36 

9 

*5 

*35 

176 

169 

3 

4 

9 

16 

7 

*3 

9* 

176 

169 

3 

4 

9 

16 

7 

11 

77 

*74 

169 

1 

4 

1 

16 

5 

9 

45 

*74 

170 

* 

3 

* 

9 

4 

7 

28 

*74 

170 

1 

3 

* 

9 

4 

5 

*0 

*73 

172 

0 

1 

0 

X 

1 

3 

3 

*73 

*72 

0 

1 

0 

1 

X 

X 

X 

9,029 

434 40* 

+ *3 ”*3 

835 

26,491 



32,659 





Arithmetic average 9029 -r- 52 = i73'63 approx. 

Median 172^. 

QuartileS 159J, 181J. 

Mean deviation from the median : tj ■*- 835 52 — i6'o6 approx. ; from the average, rj =* i6*xi. 

i^uartile deviation, or probable error, r — i(i8ii — 159$) ^ 11. Half the cases are within 
170J ± 11. 


Standard deviation : <r ** *^»49* ’^3*} “ 22*561. Coefficient of variation 

Mean difference : g ■» 32,659 -r i of 5a X 51 ■» 24*63. 


100 cr 
17303 


13*0. 


obtain the sum of the deviations from the arithmetic average 
(173-63) we must add -63 to each of the deviations Mean 

from 173 of the 28 quantities less than 174 and Nation, 

subtract -63 from the remaining deviations ; the total is then 
837-52 and the average 1611. The average of the differences 
between the various measurements and their arithmetic average 
i ,T ,6-ii in this case) is called the mean deviation of the group; 
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and often denoted by the letter rj ; we may also use the term 
mean deviation pom the median (i6-o6). The mean deviation 
is an obvious and convenient measurement of the dispersion of 
the group, and where the observations are recorded singly and 
not merged in grades is easy to calculate. ' 

We can obtain the arithmetic average from columns 3 and 4 
at once by the consideration that the average excess over 173 is 
the total of column 3 (434) less the total of column 4 (401)452 
=33-i-52 = ‘63 approx. ; the average is therefore 173 63 approx. 

In the mathematical treatment of statistical groups it is 
found inconvenient to handle these absolute deviates since 
in algebra they appear some as positive and others as negative, 
and when the theory of probability is applied it is found that 
the importance of the deviation depends on its square and not 
on its first power. Accordingly the average of the squares of 
the deviations from the arithmetical average of the group is 
taken, and the square root of the average obtained is called the 
standard standard deviation of the group ; this measurement 
deviation. 0 f dispersion is in general use and is denoted by the 
letter o. It can be calculated by writing down the deviations 
exactly, but the procedure is greatly simplified as follows. Let 
x v x 2 ... x n be the measurements, x 0 the central quantity 
from which the deviations are most conveniently measured. 
Write d 1 = x x — x 0 , d 2 — x 2 — x 0 , etc., i.e. for the deviations 
as tabulated. Let X be the arithmetic average of the group, 
so that nX = x x -f x 2 + . . . + x n ; and let d 0 — x — x 0 , so 
that nd 0 = (x l — x 0 ) + (*, — x 0 ) + . . . = d x + d 2 + . : . + d H . 
Then by definition 

<T» ={( Xl - X)« + (*,-*)•+...+ (Xn - x)*} 4 » 

= { (di — do)* + (d a — d o y + . . . } 4 n 
= + <^2* + • • • — 2 d 0 (d 1 -f nd 0 2 } 4 n • 

— {<V+<V+ . . . — nd 0 2 }-r-n, since d 1 -{-d 2 + . . . —nd 0 

and o = ~ where . S d 2 is written for 

dj 2 -\-d J . . + d n 2 . 

In the table x = 173-63, x 0 = 173, d 0 = 63. 

d\> d 2 2 . . . are given in columns 5 and 6, and . S d 2 = 26,491. 

. • . 0 = 0/(26,491 4 52 — -63 2 } = 22-56 approx. 

The standard deviation is always measured in relation to 
the arithmetical average, not to the median. _ 
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A much simpler measurement of dispersion is obtained by 
the use of the quartiles. The difference between the quartiles 
is evidently related to the dispersion, though it has the weakness 
tjiat the same measurement would be obtained from groups 
whose quartiles were the same, however the observations 
between the quartiles were distributed, and however far the 
observations outside the quartiles were placed. It is therefore 
much less sensitive than the mean or the standard deviations. 
The measurement used, however, is not the whole distance 
between the quartiles but half that distance, and Quarts 
we may call the half-distance between the quartiles deviation, 
the quariile deviation / it is commonly denoted by the letter r. 
r is approximately, but not in general exactly, equal to the 
median of the deviations. In the table the quartiles are 159J 
and i8i£, the distance between them is 22 and . * . r = n. 

The median is not necessarily half-way between the quartiles. 
In the case before us this half-way markis £(i59£+i8i£) = I7o£, 
and the quartiles are 170 £ ±11. We may then describe the 
group very simply, as follows : the arithmetic average is 173*6 
(or the median is 173) and half the observations are within the 
range nyo\ ± 11. 

In a symmetrical group the arithmetic average and the mode 
are coincident, and r is called the probable error, a term that is 
convenient in some respects, but suggests misleading ideas. 

If the data are given in grades a modification of method is 
necessary and the measurements can only be approximate. 
Take for example the table of ages on p. 86. The median and 
qiiartiles have already been found (p. 107) as 

H o ^ J , Graded data. 

41*80, 33*45, and 52*34 years. The quartile 
deviation is therefore £(52*34— 33*45) =9*45 years, and half 
tlie cases are in the range 42*905:9*45 years. The deviations 
from 42^ years are given in column 2. If we assume all the 
entries in each grade to be concentrated at the middle point 
of the grade, column 4 shows the aggregate deviations in each 
grade* and the sum of the numbers irrespective of sign, viz. 
1155 + 926 = 2081 is the total of the deviations. The mean , 
deviation from 42 £ years is then approximately 2081 4- 1000 
of 5 years = 10*40 years. A small correction is needed to 
obtain the mean deviation from the median or from the average 
. . .). ^Rather troublesome additions are needed to allow 

* T * 
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for the deviations of the 136 entries in the zero gracie and to 
correct for .the supposed concentration at the middle points. 
These become negligible if the grading is sufficiently fine (entries 
for every year would be sufficient in this case), and it is only 
then that the use of the mean deviation for graded data is 
recommended. The origin should be taken at the centre of 
that grade which contains the average or median, whichever is 
the starting-point for measuring the deviations. 

No new principles are involved in the calculation of the 
standard deviation in such cases, when the grading is fine. 
Examples are given on Part II, Chap. I below. 

Professor Corrado Gini has introduced a new measurement 
of variation ( Variability, e Mutability ; Fascicolo i°; Bologna, 
1912, pp. 19 seq.). He contends that the problem that arises 
in the study of the variability of demographic, anthropological, 
biological or economic characters is How much do the different 
magnitudes differ between themselves ? and not How much do 
diverse measurements differ from their arithmetic mean ? The 
second question is appropriate in physical science, but not in 
the description of groups. Accordingly he proposes as a 
measurement the arithmetic mean of the \n{n— 1) differences 
Mean that are to be found between n quantities. This 

difference. we ma y ca ] j the* mean difference and denote it by 

the letter g. It has not yet come into general use, possibly 
because (except in the simplest cases) the arithmetic involved in 
its calculation is indirect and rather arduous ; but it cannot be 
denied that the conception is simple and logical. 

Let a ti a t ... an be n quantities, arranged in ascending order. 
Then gx\n{n — 1) = 

{On — i) + (dn — ^ 1) • . . + (#* — '(dn — dn - i) 

+ (dn - 1 — 0 i) + (dn -1 — d % ) 4 “ . . . + (dn - i dn - j) 


+ ( a % — *l) + ( a 9 a *) 

+ ( a % — #l) 

= (n — i) On (tt — 3)^»- 1 + (ft — 5)^» - 1 • • • 

+ (i-n)a 1 +(3-n)a 1 +(5~n)a3+ . . . 

= (n- 1 )(dn-a x ) + (n-3)(an - !— a,) + [n - 5) (a* - 1 — a z ) + . . . 

The computation is readily performed as in columns 7, 8, 9 of 
the table, p. 1 1 1,* where n is an even number. If n is odd the 
central number occurs by itself with a zero multiplier. 
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The relation between g (the mean difference) and rj (the mean 
deviation) can be exhibited as follows : — , 

Let d lt d t ... dn be the differences, all taken as positive, 
between the median and a Xt a t . . . a». 

Then a% — a x = d x -\-%dn ; an- 1 — a t — d t + cb-i, etc., 

andg=-of ^2(i 1 +tin)+2 . . ^_^(i 3 +A^*)+ •••! 

while 17 = -^1 + dn + d % -{- dn-i +^3 + ^»-i + 

In g more than average importance is given to the extreme 
variations, and g is always greater than rj. E.g. } if the observations 
are spaced at equal intervals ( k), it can be shown that g is approxi- 
mately rj X | ; for, in this case, if n = 2m + 1, it is found that 

g = !(m + i)*, q = g -rv = 3(1 + ; also g- v is 

approximately \mk. 


If the instances are entered, not singly, but as y x cases at a u 
y t cases at a % ... y ( cases at a t , where y x + y t + . . . +yi = N, 
the working is more complicated. It can be shown that — 

8 X *N(N 1) — 

M(N — y t ) + yt- id t - i(N — 2 y t — y t - a ) 

+ yt-% ii-t(N — 2y t — 2y<_, - y tm% + . . . 

+ yA(N- y x ) + y«i,( N - zy x - y,) + y 3 ^a(N - 2y, - 2y t - y 8 ) 

+ . . . 

where the i's in the first and second lines are the differences to 
quantities above and below the median respectively. The factors 
are readily computed and arranged in a table.* 

When measurements are distributed according to the normal curve 

of error (Part II, Chap. II) we have the following relations : — V—Vyj “ 

= (t x • 798 . . . , r s <r x • 6745, g = rj J2 . = rj x 1*414 These 

relations are often obtained approximately in other distributions. 
Thus on p. hi, rj = *7 <r, g =* 77 x 1*41 ; but r = *50- only. 

If, following Professor Gini's idea, we take the square root of 
the average of squares of all the differences, we obtain (whatever 

the distribution) the quantity <r\J 2 ( n 3“ I ), or <r J2 very nearly. 

So far all the measurements of dispersion have been ex- 
pressed as concrete quantities, as so many shillings, years, 

* The working of the formula here given differs in an unimportant way 
Trom that used£y Gini, loc . cit., p. 30 and foot-note on p. 29. 

. • 1 2* 
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* .• 
points on a scale, etc. It- is sometimes advantageous to express 
them in relation to a mean. Thus if the median and quartiles 
of a wage group were 30s., 40s. and 50s., the quartile deviation 
is J of the median, while in another group, sj.y 35s., 45s., 55 
it would be $ ; it is, in fact, reasonable to regard the second 
group as being less dispersed than the first, though their 
quartile deviations are equal. Possible measurements of this 
, . Qu artile delat i on * ... Mean deviation 

' ' \foor» r\4 Aiiortilne ' ' 


class arc v**# — \r > — 

Mean of quartiles 

(c) Standard deviation 


Median 

but the only measurement at all 


Skewness. 


Arithmetic average 
Coefficient of generally used is the standard deviation expressed 
wution. as a percentage of the arithmetic average (i.e. 
(c) x 100) and this is called the coefficient of variation. In 
the table on p. m it is 22-56 x 100 173-63 = 13-0. 

Asymmetry or skewness of a curve is indicated when the 
mode, median and arithmetic average do not coincide. It is 
shown more definitely when the sum of the positive deviations 
from the median is not numerically equal to the sum of the 
negative deviations; it is also shown when the quartiles, or 
pairs of deciles, are not equidistant from the median. Aify of 
these inequalities could be made into a measure- 
ment of skewness. Skewness, relating to the 
shape, and not to the size, of a curve is appropriately measured 
by an absolute quantity (resembling the eccentricity of an 
ellipse), and we therefore need a ratio of two concrete measure- 
ments. The simplest to compute is as follows : let ft be the 
excess of the upper quartile over the median, and ft the excess 

of the median over the lower quartile ; then s = — , is a 

ft +-ft 

measure of skewness.t If the curve is symmetrical, q 2 = ft 
and s = o ; if ft > ft, s is positive, and if ft < ft, s is negative, 
s becomes + 1, if ft — 0, that is if the median and lower 
quartile coincide, and s becomes — 1, if ft = 0. s is therefore a 
measurement which never exceeds 1 numerically, and has a 
definite significance at zero and at its extreme values. In the 

‘table on p. hi, ft = 9, ft = 13. s = ^ = - 19- 


In the 


* In earlier editions I called this quantity, the dispersion. It has the 
advantage that it is necessarily not greater than i . 
t See also p. 251. 
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table of ages (see pp. 86 and 107) q % = 10-54, 7i = 8*35, $ = -12. 
The significance of various values can only be obtained by 
experience, but it may be suggested that T is a moderate 
degree of skewnq?s, and *3 a considerable degree. 

It should be noticed that the three characteristics of a 
group can be measured simply from the quartiles and median ; 
the median for the central position, the quartile deviation for the 
dispersion, and the measurement just discussed for the skewness. 


Some Examples of the Application of Averages. 

If our analysis of the nature and use of averages is complete 
and if averages are of widely extended use, we Application of 
should now be able to express almost any group avera * M 
of figures by a few well-chosen numbers of definite significance. 

To apply a somewhat severe test at first, let us choose 
a familiar examp’e from ordinary life, and consider how a 
suburban business man might test the merits of 
two railway systems, by one of which he intended ° wn ierT,ce ‘ 
to take a season ticket. 

The following table gives the train service between Leather- 
head and London in 1898 : — 

Train Service — Leatherwead to London. 

Number of Minutes to Journey. 

Waterloo — 

Down— 60, 50, 52, 48, 47 . 61, 5°. 44 , 48, S3, 45 , 42, 45 , 49, 43 , 48, 42, 43 - 
Sundays— 50 , 50 , 47 , 49 , 50 . 

Up — 5 1 , 46, 51, 48, 43 , 44 , 48, 48, 64, 45, 48, 47 , 45 , 47 , 46, 47 - 
Sundays— 48, 48, 51, 51, 51. 

London Bridge— 

Down— 67, 65, 65, 6i, 74 , 5 L 5 $, 66, 65, 53 , 59 , 4 L 49 , 44 , 58, 57 , 5 $, 67, 80. 
. Sundays— 67, $2, 66, 68, 88, 65, 65, 68, 65. 

Up-69, 57, 53 , 58 , 54 , 4 i, 58, 52, 42, 40, 55 , 67, 79 , 98, 69, 66, 68, 64, 71. 

Sundays— 72, 71, 69, 70, 62, 81, 73, 73. 

Victoria— 

Down— 77, 65, 55, 76, 77 , 88, 48, 53 , 46, 69, 89, 54, 82, 71, 9a 

• Sundays— 92, 45, 81, 84, 78, 61, 85, 83, 85. 

Up— 87, 65, 69, 69, 47, 48, 51, 83, 101, 58, 62, 61, 76, 103. 

Sundays— 81, 76, 80, 85, 85, 82, 94. 


The following table gives us the necessary information : — 
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It is to be noticed that the statistical method is generally 

limited to one aspect of a problem ; the question of punctuality* 
might, indeed, be easily treated statistically, but the questions 
of comfort and relative picturesqueness of r9ute will elude our 
analysis. * 

The next example shows a method of throwing into 
relief the characteristics of a typical group of sociological 
data. » 

The adjoining table gives the wages recognised by the 
Tabuuuoo <4 Amalgamated Society of Engineers in many of 
w«cm ntam their branches in 1862 and 1891. 


Amalgamat ed Society of Engineers.— Wages in 1862 and 1891, 
Weekly, exclusive of Overtime. 





1862. 

1801. 1 



1862. 

1891. 




r. 

d. 

f. 

d. 



1. d. 

/. d. 

Accrington 
Ashford - 

• 


27 

0 

31 

0 

Faversham 


- 34 O 

33 0 

• 

m 

33 

6 

30 

0 

Folkestone 


- 34 O 

32 0 

Ashton ‘Under-Lyne 


29 

3 

34 

0 

Frome 


. 24 0 

f27 0 

Bacup 

. 


26 

1 

28 

0 



I30 0 

Barrow-in-Furness 


31 

0 

34 

9 

Gainsborough 


• 27 6 

28 0 

Bath • 



29 

0 

3 i 

0 

Glossop - 


• 27 2 

32 0 

Bedford - 



27 

0 

29 

0 

Gloucester 


• 28 0 

c 3 2 0 

Bilston 



28 

0 

30 

0 

Grantham - 


- 28 6 

30 4 

Bingley - 



24 

0 

29 

0 

Grimsby - 


- 28 0 

32 0 

Birkenhead 



29 

0 

35 

6 

Halifax • 


- 23 1 

31 0 

Birmingham 



32 

0 

36 

0 

Hanley 


- 28 3 

32 0 

Blackburn 



27 

6 

32 

0 

Hartlepool 


- 26 0 

34 10 

Bolton 



27 

6 

j 28 
\ 32 

0 

0 

Heywood - 


- 27 0 

/30 0 
1 34 0 

Bridgwater 
Brighton - 



24 

24 

6 

H 

24 

29 

0 

0 

Holyhead - 
Huddersfield 


• 32 0 
. 26 0 

28 0 
26 0 

Bristol 



3 1 

0 

32 

0 

Hull - 


- 27 *'6 

34 0 

Burnley • 



27 

0 

30 

0 

Hyde 



30 0 

Burton-on-Trent 


25 

0 

30 

0 


\ 28 0 

28 0 

Bury 


* 

28 

3 

J 3 ° 

I32 

0 

0 

Ipswich - 
Keighley - 


- 28 6 

- 23 0 

28 0 
27 0 

Cardiff 


• 

31 

0 

34 

0 

Kidderminster 


• 28 0 

30 O 

Carlisle - 



24 

6 

30 

0 

Lancaster - 


- 25 0 

32 # 0 

Chepstow - 



30 

0 

34 

0 

Leeds 


- 25 0 

30 0 

Chester - 



30 

0 

32 

0 

Leicester - 


- 26 0 

31 5 

Chowbent - 



26 

0 

32 

0 

Leigh 


- 27 9 

3 £ $ 

Colne 



25 

0 

3 * 

0 

Lincoln • 


- 26 7 

28 6 

Congleton 



24 

0 

28 

0 

Liverpool - 


- 29 0 

34 0 

Coventry - 



28 

0 

34 

0 

Llanelly - 


- 22 0 

26 0 

Crewe 



29 

4 

30 

0 

Macclesfield 


• 34 O' 

29 ,6 

Darlington 



25 

0 

3 1 

6 

Manchester 


• 29 9 

35 0 

Dartford - 



34 

0 

38 

0 

Mexborough 


• 27 0 

32 0 

Darwen * 



27 

0 

32 

0 

Middlesborough 

• 25 0 

34 0 

Derby 



26 

0 

29 

0 

Middleton- 

- 

• 29 5 

33 0 

Doncaster * 



28 

6 

3 i 

6 

Milton and Elsecar 

• 28 0 

34 0 

Dover 



35 

6 

36 

0 

Neath 

- 

- 32 0 

30 0 

Enfield Lock 



35 

0 

40 

6 

Newark - 

- 

- 25 0 

29 0 

Exeter 

* 

• 

*3 

0 

f28 

I32 

0 

0 

Newcastle - 
• 

• 

•’ 2$ 0 

J 35 0 
137 0 
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Amalgamated Society of Engineers,— Wages in 1862 and 1891, 

Weekly, exclusive of Overtime (continued ) . 




18 62. 

1801. 



1862. 

1891. 



s. 

d. 

j. 

d. 



s. 

d. 

s. 

d. 

New Holland • 

• 

• 30 

8 

34 

0 

Stafford • 


34 

0 

30 

O 

Newport - 

. 

30 

0 

• 32 

0 

Stalybridge 


28 

3 

32 

0 

New Town (Stockport) 29 
Newton Abbott - - 33 

0 

0 

32 

33 

0 

0 

Stockport - 


28 

° \ 

32 

34 

0 

0 

Northampton • 

. 

26 

0 

32 

0 

Stockton-on-Tees 


24 

0 

36 

0 

Northfleet - 

. 

36 

0 

36 

0 

Stoke-on-Trent - 


29 

0 

32 

0 

Ndtth and So. Shields 

26 

0 

35 

0 

Stroud and Thrupp 


26 

0 

30 

0 

Norwich - 


32 

0 

29 

0 

Swindon - 


3i 

6 

31 

6 

Nottingham 


27 

5 

34 

0 

Todmorden 


26 

0 

28 

0 

Oldbury - 


28 

0 

34 

0 

Wakefield. 


2 5 

0 

30 

0 

Oldham - 


29 

0 

33 

0 

Warrington • 


28 

0 

34 

0 

Peterborough - 


28 

6 

33 

0 

Watford - 


35 

0 

36 

0 

Plymouth - 


32 

0 

33 

0 

Wednesbury 


26 

0 

3* 

0 

Pontypridd 

Portsmouth 


24 

35 

0 

0 

30 

34 

0 

0 

Whitehaven 


25 

0 i 

28 

36 

0 

0 

Preston 


27 

0 

32 

0 

Wigan 


28 

0 

34 

0 

Radcliffe Bridge 


27 

0 

j 3 ° 

0 

0 

Wolverhampton 

Wolverton 


28 

29 

O 

2 

33 

29 

0 

0 

Reading - 


28 

0 

} 32 

134 

0 

0 

Worcester - 
Bermondsey 


3* 

35 

0 

4 

30 

0 

Ripley 


26 

0 

26 

6 

Blackwall - 


34 

0 



Rotherham 


27 

6 

32 

0 

Bow - 


36 

0 



Rugby 



/ 28 

0 

Greenwich 


34 

0 




32 

0 

l 32 

0 

King’s Cross 


36 

0 



Rugeley - 


24 

IX 

30 

0 

Lambeth - 


35 

8 



St IMens - 


28 

0 

/ 34 
136 

0 

0 

London, E. 

„ N. - 


35 

35 

0 

10 

38 

0 

Sheffield - • 


28 

0 

36 

0 

„ s. - 


35 

0 



Shipley 


25 

9 

{ 28 

130 

0 

0 

w. . 

Marylebone 


35 

33 

6 

0 



Shrewsbury 


30 

6 

32 

0 

Stratford - 


f 35 

0 



Smethwick 


28 

0 

35 

0 


133 

6 



Southampton 


32 

0 

34 

6 

Tower Hamlets 


36 

6 



Sowerby Bridge 


24 

6 

30 

0 

Woolwich- 


36 

0 J 




The following figures show the same in brief 





X. 

1862.* 

2. 

1891.* 

* 4 *.t 

Maximum 



s. d. 

36 6 

s. * d. 

40 6 

s. d . 

Upper decile - 

- 

• 

35 0 

38 0 

38 0 

Upper quartile 

- 

- 

3 i 4 

34 0 

36 0 

Median - 

• 

- 

28 0 

32 0 

34 3 

Arithmetic average - 

- 

- 

28 10 

32 4 

33 4 

Modes - 

- 

- 

0 

00 

f 30 0 
I32 0 

30 0 

... 

Cower quartile 

. 

- 

26 0 

31 6 

Lower decile - 

- 

- 

24 6 

28 6 

30 0 

Minimum 

- 

- 

22 0 

24 0 

... 

Quartile deviation - 

- 

- 

2 8 

2 0 

2 3 

Skewness, from quartiles 

• 

• 

25 

0 

— .22 


• Each branch counting as i. 

t The numbers of members in each branch counted as receiving the 
wage recognised there. 9 • 
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If the rates at each branch were not those actually paid to 

all members, but their average, while the actual wages were 
confined within small limits of that average, the figures in the 
last column would be little affected. 9 «, 

On comparing columns i and 2 it will be seen that not 
only have all the averages increased, but that since the lower 
decile and quartile have increased more rapidly than the upper, 
the lower half has also gained on the upper. Again the wages 
are grouped more closely in column 2 than in column 1. 


Group C of Tabulation. — It was necessary to postpone 
the tabulation of non-numerical or descriptive answers till we 
Tabulation of h a d finished our discussion of averages. The fol- 
descriptive lowing detailed example shows how the median, 
answcr *‘ etc., can be used to give a short description of a 
large group of adjectival answers. 

In 1891 the Amalgamated Society of Engineers obtained 
from all their branches answers to the question : To what 
extent is overtime worked ? The branch secretaries sent 
answers which may be tabulated as on next page. 

An inspection of the table here given will show sufficiently 
the method of tabulation. The position of most of the answers 
Explanation of in an imaginary scale is fairly definite, except that 
table - it is not always obvious where the numerical 
answers should be placed; this must be decided either by 
internal evidence or practical knowledge of the trade. The 
same adjectives did not of course convey exactly the same 
numerical meaning to all the branch secretaries who used them, 
but it will be admitted that this tabulation gives a fairly clear 
view of the case, and that the method of medians and quartiles 
may be appropriately applied Taking the member of a 
branch as the unit and neglecting the unclassed answers, the 
median is " Maximum 18 hours in 4 weeks " or " moderately/' 
the lower quartile “ Very little," and the upper quartile “ 14 
hours when busy." Taking the branch as unit, the median is 
" Not much," the quartiles are “ Very little " and “ When 
•necessary " or " Occasionally." 

This method, which, with varying degrees of precision, is 
widely applicable, seems to afford the only way of comparing 
two such groups of answers. The precision attainable is to be 
measured b^ the distance through which the median can be 
shifted by making reasonable variations in the scheme of 



Answers. 



Number of 

Number of 

i 



Branches. 

Members. 

None 

flat worked - 

: 

- 

4 

1 

140 

78 

Very little 

- 

- 

2 3 

4 . 8 3 6 

To very limited extent 

- 

- 

1 

# 63 

Very occasionally 

- 

- 

1 

350 

A little on repairs 

- 

- 

1 

5 °° 

Little - % - 

- 

- 

2 

73 

2 hours when necessary 

- 

- 

1 

80 

Seldom 

- 

- 

1 

59 

Small extent - 

- 

- 

1 

16 

Seldom except on repairs 

- 

- 

1 

66 

Only on repairs 

- 

- 

2 

216 

Not much 

- 

- 

6 

t,I 25 

On repairs 

- 

- 

1 

500 

Not to any extent 

- 

- 

3 

644 

Not to a great extent - 

- 

- 

2 

162 

Not general - 

- 

- 

1 

7 

Not systematically 

- 

- 

2 

43 

In cases of breakdown or emergency 

- 

7 

606 

2 hours regularly 

- 

- 

1 

136 

Chiefly on repairs 

- 

- 

1 

20 

Occasionally - 

- 

- 

2 

90 

When necessary 

- 

- 

1 

348 

Casually (sic) - 

- 

- 

2 

142 

A good deal on repairs 

- 

- 

1 

23 

Maximum 18 hours in 4 weeks 

- 

1 

1,000 

Moderately 

- 

- 

3 

262 

Systematically in good trade 

- 

• 

1 

200 

Average about 5 hours a week 

- 

1 

96 

Considerably in marine shops 

- 

1 

400 

Systematically in dockyard 

- 

- 

1 

650 

General 

- 

- 

2 

146 

Systematically 

- 

- 

1 

693 

Great amount- 

- 

- 

1 

263 

To a great extent 

- 

- 

1 

72 

Excessively 

- 

- 

1 

550 

9 hours a week 

- 

- 

1 

39 

10 

- 

- 

1 

106 

12 „ (maximum) 

- 

- 

1 

700 

14 „ (when busy) 

- 

- 

1 

106 

10 to 18 hours a week - 

- 

- 

1 

5,000 

Total 


. 

83 

20,666 

Unclassed : — 





No answers - 

- 

- 

36 

5> XI 4 

As little as possible - 

- 

- 

1 

250 

Not so much lately* - 

- 

- 

1 

* 160 



_ 

T 
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Now that we have the method of averages at our disposal 
, , we may use it for tabulating and summarising a 

group of figures. 

Consider, for example, the answers to th§ questions issued 
by the Commissioners on Trade Depression in 1886. 

Four of the questions were : — 

1. Number of men, in Society. 

2. Number, out of work in 1885. 

3. Weekly wage in 1885. 

4. Change in wages between 1865 and 1885. 

The following table shows the answers given by the branch 
secretaries of the Amalgamated Society of Engineers : — 


X. 

District. 


No. in 
District, 
1885. 

3* 

No. Out 
of Work, 
1885. 

Current 

Wages, 

1885. 

5 . 

Wage change between 1865 
and 1885. 

Belfast - 


1,100 

130 

28/ to 36/ 

Slight increase. 

Coventry 

■ 

2,500 

230 

• 

3«/6 

Contract work— 50 */• de- 
crease. « 

Duk infield * 

- 

170 + 

20 + 

31/ 

Slight increase. 

Dundee 


1,400 

457. 

25/ .killed. 

1 5/ unskilled. 

Time work — 1865, 22 1 ! ’7*, 
24/; ’80, 26/; ’83, 24/ j 

’85, *5 /• 

Glasgow 

• 

28,000 

4,000 

26/ 

Time wages, 5 above 

1864. 

Rise in 1872-73 of 15 */,; 
1885 same as 1865. 

Glasgow (St Rollox) 

1,600 

250 

31/6 

Hartlepool - 

- 

1,200 

400 

Advance of 3/. 

Glossop 

• 

•M 

IO 

3V 

m 

Liverpool 

• 

38 

... 

Rise in 1872-73 of 7i */•! 
1885 same as 1865. • 

Monifieth 


H4 

18 

21/ 

Skilled work — 1865, 24/; 
>76, 27/; ’78, 25/; '83, 
28/; *85, 25/. 

Nottingham - 

• 

4,000 

600 

34/ minimum. 

1865, 28/; 1885, 34/. 

Oldham 

- 

1,600 


33/ average. 

Increase of 5 */.• 

Oxford - 

- 

45 


33/ 

Paisley • 

• 

800 

... 

28/6 

1865, 26/; 1885, 28/6. 

Preston 

• 

630 

40 

*8/ 

None. 

Preston 

. 

900 

120 

28/ 

None. 

Shipley • 

• 

201 

15 

28/6 

•4/ noa-unionlsts. 

1865, 28/6; 1869-73, 32/5 
1885, 28/6. . 

Sowerby Bridge 

- 

1,120 

43 

28/ 

1865-75,25/6; 1875-85,28/. 

Sunderland • 

* 

3,200 

400 

33/ 

1864, 27/ ; ’74, 34/ ; 1875- 
85, between 31/ and 37/. 

Swindon • 

• 

6,050 

2 

31/6 

Ulverston • 

• 

45 

... 

3*/ 

1865, 26/ ; 1875, 31/. 

Wednesbury • 

• 

400 

30 

3°/ 

Increase of 2/. 

Workington • 

• 

170 

70 

28 to 36/ 

Increase of 30 */«• 


4 
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It is suggested that the following are the summary tables 
which should be inserted in a report dealing with jthe answers. 

The figures are given here for only one society, but the 
tabulations are Earned so as to include all. 


TABLE I. — State of Employment. 


• 

Name of Society. 

Total Number* 
in Branches 
making Returns 
on Employment. 

Number Out of 
Work. 

Percentage Out 
of Work. 

Median of the 
Percentages Out 
of Work in the 
Various Branches. 

A.S.B. 

• 

O.S.B. 

&Ce 

55.170 

7.143 

*3 

ia 


* Details of some of the most important branches should be added. 


TABLE II. — Current Wages. 


Name of Society. 

• 

Average of Wages in Branches. 

Quartiles of 
Branch Wages. 

Measure of Die* 

persion. 

(r. p. 1 16 (a)X 

Unweighted. 

Weighted. 

A.S.E, 

O.S.B. 

&Ca • 

/. d. 

a d. 

s. d. m. tL 


30 0 

29 7 

28 O 32 O 

A 


TABLE III. 

A. Change of Wage between 1865 and 1885. 


■ 

Number of Branches showing 

Median 
of Per- 

Percentages of Members in Branches 
showing 

■ 

No 

Answer. 

De- 

crease. 

No 

Change. 

Increase. 

centage 

Increases. 

No 

Answer. 

De- 

crease. 

No 

Change. 

Increase. 

A.S.E. 

O.S.B. 

Ac. 

4 

1 

5 

*3 

10 

II 

4 

6 

79 

1 


Verbal Summary . — In the great majority of cases a con- 
siderable increase of wage took place between 1865 and 1885, 
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equivalent on the whole to a rise of about io per cent. The 
figures are not sufficiently definite to give an exact average. 

Table III. — B . Change of Wage between 1865 and the t 
Maximum about 1873. 

Table III. — C. Change of Wage between Maximum about 
1873 AND 1885. 

(Tabulation as in III. A.) 



CHAPTER VII. 

• THE GRAPHIC METHOD. 

i. General Purpose. 

The t.wo main methods of elementary statistics which ought 
to be understood by all students or officials who handle figures, 
which are easily within the grasp of all independently of mathe- 
matical training, but are generally misunderstood or ignored by 
the uninterested or the uninitiated, are the method of averages 
and the method of diagrams or the graphic method. These two 
are placed together because the uses of averages and diagrams 
are nearly related. When we deal with large and complex 
masses of figures we are unable to grasp them in Average* and 
their entirety, however clearly they may be tabu- diagrama - 
lated. Any list of figures — the populations of different towns, 
the death-rates at successive ages, the wages of many work- 
people, the imports for a series of years — becomes less compre- 
hensible as its length increases. A series of ten numbers can, 
perhaps, be easily grasped, of twenty only with an effort ; while 
a printed list of figures for one hundred successive years leaves 
hardly any impression on our mind at all ; we cannot see the 
wpod for the trees. The test to which all questions as to the 
use of averages should be referred is that the averages selected 
should ‘afford the best summary of the whole group in question 
tljat the mind can grasp. When the meaning of the word 
average was sufficiently extended, we found that we could select 
three, four, or even ten suitable figures which adequately showed 
the main features of any group. The main use of diagrams is 
also to present large groups of figures so that they shall be 
intelligible in their entirety, and the test for all diagrams is that 
the diagram as drawn should afford the best view of the series 
or group of figures that the eye can appreciate. Diagrams have 
one use which averages have not, for it is only by a diagram that 
a series of figures relating to successive years can be adequately 
• 125 
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presented ; but in reality they are less essential than averages, for 
the latter often have an existence independently of the figures 
from which they are derived, representing true types of the 
quantities which are being measured ; and by their use alone 
are further comparisons of complex grotlps made possible : while 
diagrams, on the other hand, might be dispensed with, being 
auxiliary rather than essential, merely an aid to the eye and 
a means of saving time. 

To connect this chapter more closely with the preceding, we 

Gnphic will show how the same group of figures, for 
representation example the wages of a large group of workpeople, 
an areracea. ma y re p resen t e( j by either method. 

Consider the following data : — 


Numbers of workpeople earning — 


From 15/ to 16/ 

200 


From 25/ to 26/ 

- 1,200 

„ 16/ „ 17/ - 

400 


»» 

26/ „ 27/ 

800 

.. 1 7/ „ 18 / - 

IOO 

1,000 

#» 

27/ „ 28/ 

700 

„ 18/ „ 19/ - 

IOO 


99 

28/ „ 29/ 

- 500 

„ 19/ „ 20/ - 

200 


99 

29 / 30 / 

- 3 °o, 

„ 20/ „ 21/ - 

200' 


99 

30 / „ 31 / 

- 300' 

„ 21/ „ 22/ - 

300 


99 

31 / 32 / 

400 

.. 22/ 23/ - 

300 

2,200 

99 

32 / „ 33 / 

400 

„ 23/ „ 24/ - 

500 


99 

33 / 34 / 

- 500 

.. 24/ „ 25/ - 

900 


99 

34 / 35 / 

- 500, 


From 35/ to 36/ 

.. 36 / „ 37 / 

.. 37 / .. 38 / 

» 38 / „ 39 / 

.. 39 / » 40 / 


600 

400 

IOO 

80 

20 


1,200 


Using the method of averages we should replace this group 
by the following figures : — 


Average of all 

*. 

- - 27 

d. 

6 

„ lowest 1,000 

17 

0 

„ highest 1,000 

- - - 36 

6 

„ middle 4,000 

* . . 27 

0 


or 

Median, 26/9; quartiles, 24/2, 32/. 

Deciles, 20/, 23/6, 24/9, 25/8, 26/9, 28/2, 31/, 33/4, 35/4. 
Mode, 25/3 ; secondary positions, 16/6, 36/. 
or 

Persons earning from 15/ to 20/ 20/ to 25/ 25/ to 30/ 30/ to 35/ 35/ to 40 

Percentages of all - xo 22 35 21 12 
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This group is represented on the annexed diagram, an 
example of the graphic representation of the relation between 
two variable quantities. A figure similar to this CoMtructioo 
may be used to shpw marriage, or death-rates at of#im P ie 
different ages, numbers *of persons of various dia * r " n, ‘ 
statures, demand at different prices, or any such group of 
homogeneous quantities. The same construction can be 
used to show the changing values of any number in a series 
of years. Draw a line parallel to the bottom of the page, and 
mark equal intervals to represent a quantity which can have 
many successive small increments, such as age, income, height, 
price, time, and so on. This is called the axis of abscissa, 
and the distance of a point measured from the zero position 
along the line is called its abscissa. At right angles to this 
line, parallel to the side of the paper, through the zero position 
we draw another, called the axis of ordinates , and grade this 
to correspond to the numbers possessing the qualities repre- 
sented by the abscissae ; at each grade on the axis of abscissae, 
draw lines at right angles to it, to represent on the chosen scale 
the numbers at that grade ; these lines are called the ordinates. 
In thfe annexed diagram the abscissae represent the amounts 
of wages, the ordinates the number of persons earning them. 
Join the tops of the ordinates by straight lines and the diagram 
is complete. In practice, when squared paper is used, without 
drawing the ordinates their tops can be marked. 

This diagram shows at one glance the distribution of the 
wage-eaners according to their wages. A small number earned 
between 15s. and 16s., a slightly larger group DmaipHon 
between 16s. and 17s., very few between 17s. and ofthew««e 
19s. Above 19s. the number continually rises; dia * ram ‘ 

hijjh numbers are found from 24s. to 27 s., the highest between 
^25s. and 26s. The line falls to the 30s. group, but not so low 
as between 17s. and 19s., then it rises regularly to 36s., and 
falls rapidly to 39$. Here, then, we have the main group 
congregated in the neighbourhood of 25s., a distinct. but smaller 
group lat 36s., and a small and nearly isolated group at 16s. ; 
representing a considerable group of highly-skilled men between 
30s. and 40s., the great mass with ordinary skill between 20s. 
and 30s., and a small group of incompetents at 16s. These 
features would not be so easily seen from the tabulated 
figures. 
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It is to be noticed that the number tabulated as between 
15s. and 16s. is represented by the ordinate at 15s. 6 d., the 
middle of the interval; if the original figures on which the 
table was based had been given to the nearest id., the ordinrte 
should be drawn at 15s. 5 \d. It is important that these middle 
points should be accurately placed. 

The use of the line joining the tops of the ordinates is two- 
fold. First, it enables the eye to judge relative heights more 

Continuity. easil y I and secondly, it suggests the idea of con- 
tinuity, which can be better illustrated by the next 
diagram. In this the abscissae represent ages, the ordinates 
the estimated numbers of persons living at and above the ages 
at which they stand per thousand inhabitants of England and 
Wales at the middle of the year 1891. The ordinates were 
drawn at the points on the axis of abscissae representing the 
middle of each year of age ; but length of life cannot be ex- 
pressed exactly in years, or even in months, days, or minutes. 
The intention of the diagram is to show the proportion living 
above each age, and for this purpose the joining line should 
have no breaks or sharp angles, but should suggest absolute 
continuity. 

In practice, it is useless to mark in the points for smaller 
intervals than a year, for the eye could not grasp the detail. 
It is, however, implied that the line drawn has the same shape 
as that which would result if the number of persons was infinite 
and the subdivision by age infinitesimal. 


Estimated number per 1,000 of the population at and above — 


Ages. 


Ages. 


Ages. 


Ages. 


Ages. 


O 

1,000 

16 

628 

32 

346 

49 

152 

65 

47 

I 

973 

17 

607 

33 

332 

50 

M 3 

66 

43 

2 

949 

18 

587 

34 

318 

5 1 

135 

67 

38 

3 

925 

19 

567 

35 

305 

52 

127 

68 

34 

4 

901 

20 

547 

36 

292 

53 

119 

69 

31 

5 

8 77 

21 

528 

37 

280 

54 

112 

70 

27 

6 

854 

22 

5 io 

38 

268 

55 

104 

7 1 

24 

7 

830 

23 

491 

39 

256 

56 

98 

72 

21 

8 

807 

24 

474 

40 

244 

57 

91 

73 

18 

9 

783 

2.5 

456 

4 1 

233 

58 

85 

7 < 

15 

10 

760 

26 

439 

42 

222 

59 

79 

75 

13 

11 

738 

27 

423 

43 

211 

60 

73 

76 

II 

12 

715 

28 

407 

44 

201 

61 

67 

77 

9 

13 

693 

29 

391 

45 

191 

62 

62 

78 

8 

*4 

671 

30 

376 

46 

181 

63 

57 

79 

6 

x 5 

649 

31 

361 

47 

171 

64 

52 

80 

5 





48 

161 




Calculated from the Census of 1891. * 
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Numbers per i,ooo of the Population above Assigned Ages. 
Numbers. 



Ages. io ao 30 40 50 60 70 80 


Apply these remarks to the diagram facing p. 127. Average 
earnings for a year will not be reckoned exactly by shillings 
or even pence ; if we had a sufficient number of instances we 
should get regular sequences of earners at successive farthings, 
and *the line representing them would have no sharp angles, 
but be continually curved. The figure rightly gives the eye 
this impression of continuousness. Similarly in the diagram 
representing exports facing p. 134, the line correctly gives the 
impression that exports are continuous day by day. 

By an obvious step we may suppose that the unit of area , 
that contained between vertical fines through two consecutive 
divisions on the axis of abscissa, and horizontal 
lines through two consecutive divisions on the axis 
of ordinates, represents one wage-earner, and it is then easy 
to see that the area contained between the base line, the curve, 
aild two vertical lines through the points marking any two 
amounts of wage represents the total number earning rates 
between those amounts. 

Hence the fines (diagram, p. 127) through M, the position of 

the median, Q x , Q s those of the quartiles, D v D 2 , D 8 , D 4 , M, D 6 , 

D 7 , D 8 , D 9 of the deciles divide the area ABm 1 m 2 m dt CD into two, 

four, and ten equal areas respectively. The centre of gravity 

of this figure lies on the vertical fine through V, the average 

wage ; and the feet of the ordinates through the highest points 

tn v m t$ m s are at the modes. 

* * 
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When the grades in which the data are tabulated are wide 

it is better to use the method of the next diagram, which we 
may call a block diagram. 

This and the drawing underneath it illustrate the numbers 
of married men distributed by age which are given on p. 86. 

In that table we have no information except 

Graded data. . , r 

that such a proportion are as old as twenty years 
and not as old as twenty-five years, etc. This is precisely 
represented by constructing a rectangle with base the interval 
that represents five years, and height proportional to the 
number recorded within that interval. The method of the 
diagram facing p. 127 would suggest that all were at the 
middle of the grade. In the case of ages we know that the 
succession of numbers year by year ought to be continuous, 
and a complete representation would be a continuous curve, 
such that the area standing on a five years’ interval equals 
the area of the corresponding rectangle. Such a curve is drawn 
free-hand on the diagram. If the figure is such as to leave 
little margin of uncertainty as to the position of the curve 
throughout, then the curve is an adequate representation of 
the facts. 

The data may also be represented by the lower diagram, 
where the crosses show the information as recorded in the 
table. These crosses are joined by straight lines ; the resulting 
figure may, if the phenomena are continuous, be replaced by 
a curve, which in this case would hardly be distinguishable 
from the straight lines. 

The details of technique of diagram drawing, the position 
of the scales, the devices for making the figure clear, and so 
Requisite on, can be gathered from the various diagrams 

eccuracr. given in this chapter. The degree of accuracy to 

which the figures should be marked, whether correct to 1 ’ a , 
million, a thousand, or a unit, is determined simply by the 
power of the eye to grasp detail ; in most of those here given 
it will be found that a displacement of one in a thousand is 
perceptible, and this is the ordinary limit. More rtiinute 
accuracy is useless, for it is not the function of diagrams to 
dispense with lists of numbers, but only to enable the eye to 
perceive their significant features. 

Before discussing the choice of scales on which the numbers 
are to be represented, it is necessary to consider, the ways in 




AGES 


o ]<ice page ijo 
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which a diagram makes an impression on the eye. The eye 
•can judge— (1) Distances; (2) ratios; (3) angles. • 

The dotted lines in the diagram facing p. 134 will 
illustrate these points. (1) The eye is a fairly safe judge of 
distances ; there is very little doubt which of two points is the 
further from the base line; when squared paper is used, a 
difference of 1 in 1000 is perceptible. The eye can also judge 
differences quickly. In the figure the value of the exports in 
1883 exceeded that in 1885 by more than the value in 1890 
exceeded that in 1883. (2) It can be seen that the value of 

exports approximately doubled between 1862 and 1889; or 
that the value in 1878 is about three-quarters of that in 1890. 
The accuracy with which the eye can make such measurements 
is not great ; it is not easy to detect that the ratio of the values 
in 1873 and 1871 (1095 : 1) is greater than the ratio of the 
values in 1882 and 1880 (1073 : 1) ; but the general impression 
given by the diagram is partly made up by unconscious calcula- 
tions of this nature. To make these observations accurately 
the method described on pp. 169 seq. should be used. Notice 
that for these observations the insertion of the base line is 
necessary; and, because they are made unconsciously, a dia- 
gram showing movements over a series of years without a base 
line gives an incorrect impression. (3) The question. Was the 
increment greater in 1886-87 or in 1887-88 ? can be more 
quickly answered by observing the angles than by noting the 
differences. The line showing the latter change is steeper 
(makes » greater angle with the horizontal) than the line showing 
the former. Hence the latter increase is the greater ; actually 
£12,600,000 against £9,200,000. The most useful exercise 
of this power, however, is to judge the dates at which the rate 
of increase changed; thus the value of exports increased in 
1 1 $62-63, increased at a slower rate in 1863-64, and slower 
yet in 1864-65, more rapidly in 1865-66 ; a slow fall followed 
in 1866-67, then an increase began which is continually 
accelerated to 1871, and so on. The line from 1872-76 is 
cbncave to the base line, showing an accelerated fall; the 
concavity from 1879 to 1882 corresponds to a retarded 
rise. The increases so shown are absolute or actual, not 
relative or in ratio to the quantities at the beginning of each 
period. 

It is diffi«ult to lay down rules for the proper choice of the 

* K2* 
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scales by which the figure should be plotted out. It is only the 
choie* oi • ratio between the horizontal and vertical scales* 
that need be considered. The figure must be 
sufficiently small for the whole of it to be visible at once ; if the 
figure is complicated, relating to a long series of years and 
varying numbers, minute accuracy must be sacrificed to this 
consideration. Supposing the horizontal scale decided, the 
vertical scale must be chosen so that the part of the line which 
shows the greatest rate of increase is well inclined to the vertical, 
which can be managed by making the scale sufficiently small ; 
and, on the other hand, all important fluctuation^ must be 
clearly visible, for which the scale may need to be increased. 
Any scale which satisfies both these conditions will fulfil its 
purpose. The page opposite shows the erroneous impressions 
which can be given by a judicious manipulation of the scale 
and by the omission of the base line. The diagrams, which 
are drawn roughly, all represent the same estimates of wages in 
England and in the United States of America for certain years 
from i860. Figure 1 sets the lines in proper relief. In Figure 2, 
Necessity <4 the base line is not drawn in the zero position 

correct for the English scale, and the American scale is 

b "* lln *' reduced; the consequence is that English wages 
appear to have fluctuated widely, while American made steady 
progress. In Figures 3, 4, and 5 the scales are doctored and the 
base fine adjusted, so that in 3 American wages seem to have 
caught up English, in 5 exactly the reverse is the case, while in 
4 wages appear to have moved with equal rapidity-* in both 
countries. An examination of these figures will show that the 
eye cannot be trusted to supply the right base line, or to 
estimate the importance of fluctuations without it ; and, with 
certain exceptions to be mentioned later,* it is well to distrust 
all those numerous diagrams, where space has been economised 1 
at the expense of the base fine. 

We can now pass on to the consideration of the smooth- 
ing of curves, for which purpose the question of the " alleged 
Smoothing stationariness of our exports,” discussed by Sir R. 

1 eum.. Giffen in his paper before the Royal Statistical 
Society in 1899, affords an excellent illustration. The thin 
dotted line on the diagram opposite shows the value of exports 


• See pp. 135 ssq. and p. 171, infra. 
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Total Declared Real Value of British and Irish' Produce 


Exported from the United Kingdom, i = £1,000,000. 


■ 

■ 

Average*. 

1 

■ 

Averages. | 



Tan 

Yearly. 

Three 

Yearly. 

Five 

Yearly. 

Ten 

Yearly. 

1855 

95-7 

... 


• •• 

1881 

234.0 

216. 2 

208. 2 

221.6 

1856 

115.8 

... 


•M 

1882 

241.5 

232.9 

216.7 

220.1 

1857 

122.0 

III . 2 


... 

1883 

239.8 

238.4 

226.0 

218.6 

1858 

Il6.6 

ix8.x 


... 

1884 

233.0 

238.1 

334-3 

217.9 

1859 

130.4 

123.0 

1 16. 1 


1885 

213. 1 

228.6 

232.3 

216.9 

i860 

135*9 

127.6 

124,1 

... 

1886 

212.7 

2x9.6 

228.0 

2x8.1 

x86i 

12{.I 

130.5 

126.0 

... 

1887 

221-9 

215.6 

224.X 

220.4 

1862 

124.0 

128.3 

126.4 

... 

1888 

2345 

223.0 

223.0 

324-5 

1863 

146.5 

13X.9 

133.4 

... 

1889 

248.9 

335.1 

226.2 

230.2 

1864 

X60.4 

143.7 

138.4 

127.2 

1890 

263. s 

249.0 

936.3 

334.3 

1865 

I65.8 

157.6 

144.4 

1343 

1891 

247.2 

353*3 

343-3 

235-5 

x866 

X88.9 

171.7 

157.2 

X4X.6 

1892 

227.X 

345.9 

244.2 

33*1 

1867 

l8l.O 

178.6 

168.7 

* 47*5 

1893 

2l8.X 

230.8 

240.9 

231.9 

1868 

179.7 

183.2 

I 75 -* 

153.8 

1894 

215.8 

220.3 

*34.3 

230.2 

X869 

190.0 

183.6 

181.0 

159.8 


225.9 

219.9 

226.8 

231.4 

1870 

199.6 

189.8 

187.8 

165.9 

1896 1 

240.X 

227.3 

* a 5*4 

334.1 

1871 

233.1 

204.2 

1946 

175-7 

1897 

3343 

333-4 

226.8 

1 235 4 

1872 

356.3 

226.3 

209.7 

188.9 

1898 

3 33-4 

335 9 

229.8 

I 235* 3 

1873 

355.3 

344.9 

224.8 

200.0 

1899 

*5 5 3 * 

241.0 

237.8 

236.1 

1874 

239.6 

350.4 

* 34-7 

207.9 

1900 

283.6* 

a 57- 4 

349.3 

238.x 

1875 

323.5 

3394 

239.6 

213.7 

1901 

270.9* 

269.9 

355.5 

240.5 

X876 

200.6 

221.0 

335 * 1 

2x49 

1902 

277.7 

377*4 

264.2 

245-5 

1877 

198.9 

207.7 

223.7 

2x6.7 

1903 

286.5* 

278.4 

274.8 

252.3 

1878 

192.8 

197.4 

2x0.9 

218.0 

1904 

2963* 

286.8 

283.0 

260.4 

X879 

I 9 I *5 

194.4 

201. 4 

218.x 

1905 

3 * 4-4 1 

302.4 

291.2 

270.2 

1880 

223. X 

202.5 

201.3 

220.5 

1906 

367.0* 

3 * 9.3 

3 10 -4 

282.9 


• Not including the value of ships exported. 


year by year, and the first impression given by it is that exports 
have not grown in value in recent years. Sir Robert Giffen 
gave the following table : — 


Average Annual Value of Exports. 


1855-57 - 

£134,000,000^ 

1865-67 .... 

228,000,000 

1875-77 - 

264,000,000 

1885-87 .... 

274,000,000 

1895-97 .... 

292,000,000 


and from this he deduced “ that all through there is an increase, 
and that the only sign of stationariness is an increase at a less 
rate in the last periods than in the earlier periods.” 

The Saturday Review * wrote " that such a conclusion is 
grossly misleading,” for the figures are merely triennial averages 
of selected years showing a happy coincidence ; “ why was not 
1898 included ? ” An inspection of the numbers does not show 
us the answer to this criticism, but on the diagram the whole 


* January 1899, pp, 66, 67. 
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circumstances are visible at a glance. Since 1865 three great 
waves have been completed. The maximum of 1872, due to the 
inflated prices of that year, is very high, but that of 1890 is 
greater than any previous figure, while the maximum in 1882 is 
comparatively low. The minima increase throughout ; those 
of 1868, 1879, I 886 show a regular progression, which falls off 
greatly in 1891. In 1894-96 it looked as if another decennial 
cycle was in progress, but this was checked in 1897. Since 
the discussion, the returns for the successive years to 1906 
have shown an increase, surpassing that which preceded 1872. 

The Saturday Review went on to ask why Sir Robert Giffen 
did not give " proper quinquennial averages/’ such as — 


Average Annual Value of Exports. 


1870-74 

1880-84 

1890-94 

1898 


£235,000,000 

234,000,000 

234.000. 000 

233.000. 000 


and it must be granted that this gives an appearance diametric- 
ally«opposite to that of the previous table. 

It is clear that we need some general method of bringing 
these figures into a form which shall be quite independent of the 
choice of any special years. The diagram facing page 134 does 
this. The thin continuous line, lying almost over the dotted 
line of annual values, shows triennial averages taken yearly, 
that is the average of each year with those before and after it ; 
this line smooths off the corners without affecting the general 
appearance. The line of crosses shows quinquennial averages, 
each year being averaged with the two previous and two 
subsequent years. The line of circles shows decennial averages ; 
each circle is placed at the centre of the period whose average 
' it represents ; thus the circle showing the average of the ten 
years 1875-84 is placed vertically over the line separating the 
years 1879 and 1880.* 

. On looking at the line of quinquennial averages it is clear 
that the Saturday Review did precisely what it accused Sir 
Robert Giffen of doing, for years are taken which choice oi 
favour the argument. The quinquennial periods pre- 
selected for comparison with 1898 are all on the upper parts 


* In all the curves of averages the mark showing the average is placed at 
the centre of grivity of the marks showing the 3, 5, or 10 quantities averaged. 
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of the waves, the marks showing these averages are very near 
the maxima of the quinquennial line, while the year 1898 does 
not appear to be a maximum. We might with just as much or 
as little accuracy give the following * 

Quinquennial Averages of the Values of Exports. 

1865-69 £181,000,000 

1875-79 201,000,000 

1885-89 226,000,000 

1898 233,000,000 

and say that the value in 1898 was higher than any of the pre- 
vious selected averages. There is no need to use arbitrary dates 
to get at the facts. No argument can stand which does not take 
account of the cycle of trade, which is not eliminated till we 
take decennial averages. Special marks in the diagram show 
the averages for decennial periods, indicating a rapid increase 
before 1870, followed by steady slow progress till the subsequent 
expansion. The complete line gives just the same general 
appearance. If, finally, the figures were completely smoothed 
by a freehand line keeping as close to this as was possible, 
without making sudden changes of curvature, the same appear- 
ance would be given; the thick line on the diagram is an 
attempt to do this. The smoothing is obtained by the assump- 
tion that the cycle of trade is ten years ; when two maxima fall 
within the same ten years the average of this period by our 
construction gives the appearance of a maximum (e.g ., 4 in 1887) 
at a date of a minimum. This would be avoided if we con- 
tinually changed our period for averaging to accommodate 
the changing wave-length, a somewhat arbitrary proceeding. 
The difficulty thus arising can be easily corrected by the eye, 
and the final smoothed line is intended to convey this corrected < 
impression. 

It should be clear now that it was in 1899 ^ ve years too 
soon to pay attention to the particular figure for 1898; the 
figures for the next five years, necessary to determine the char- 
acter of the coming wave, could not be foretold. When these 
are included it is seen that each decennial average (for 1890-99, 
1891-1900, etc.) established a new record, and that the figures 
for each year from 1900 to 1906 are greater than those of any 
previous maximum. It will be seen, moreover, that the sentence 
quoted from Sir Robert Giffen on p. 13*4 is fully justified. 
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The smoothed line now constructed represents the general 
tendency of the value of exports, when accidental and tem- 
porary variations are removed. If it were possible ^ 

te> separate entirety variations of short period from 
secular changes, to separate the ebb and flow of the Tr “ 4 ’ 
tide of commerce from the steady current of increasing trade, 
we may suppose that we should obtain a result represented by 
this line. In it there are no sudden changes even in rates of 
growth, while the addition and subtraction year by year of 
relatively small quantities would produce precisely that 
irregular .fluctuating line from which the smooth line was 
obtained. 

The diagram can be continued from the following numbers : — 


1907 - 

- 


416*0* 

369*1 

338-0 

301*1 

1908 



366-5* 

383-2 

354-0 

314*4 

1909 - 



372-3* 

384-9 

369-2 

326*1 

1910 



421*6* 

386*8 

388-7 

339*9 

1911 



■448-5* 

4 I 4 * 1 

405-0 

3567 

1912 



480-2* 

450*1 

417-8 

377*9 

1913 - 



514-2* 

481*0 

427-4 

400*7 


• Not including the value of ships exported. 


The records during the war are not comparable with those here 
given. The reader is recommended to study the diagram as printed 
and to judge how far forecasts of amount, fluctuation and general 
movement are possible, before looking at the actual records of 
I 9 ° 7 ~ I 3 * 

The direction of the smooth line at any date may be called 
the trend of the series at that date. When the smooth line is 
approximately straight over several years, its general direction 
sfiows the trend in that period. 

A special method of determining the trend has been recently 
ueed by Professor Moore ( Statistical Journal , 1919, p. 375). He 
assumes that the general movement over a stretch of years can 
be represented by the equation y~a-\-bx-\-cx*-\-dx*, and determines 
the values of a, b, c and d by the condition that, if yt is the observed 
value at a date xt, then S(y< — a — bxt — cxi 1 — dx?) x should be a 
minimum. Professor Persons (Review of Economic Statistics , Har- 
vard, Preliminary Volume, No. 1) assumes that a straight line is 
sufficiently accurate and minimises S(y< — a — bxi) % . It is doubtful 
whether either of these methods is of general application, and 
Persons' hypothesis in particular must be used with discretion. 
The method *>f moving averages (used in the test above) is cer- 
tainly more sensitive foi showing changes in the direction of the 
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trend if a long series of years is under consideration, and the 
general causes which determine the phenomena have definitely 
varied several times. 

The fuller discussion of " smoothing ” series of figures be- 
longs to the chapter on interpolation, but one other group may 
smoothing a h ere b e considered, as showing the use of the 
homogeneous graphic method for obtaining regularity out of 
group irregular raw material. Referring back to the 
figures given on p. 69, we can exhibit the wages of 5000 
workers anew by a diagram, in which the ordinates represent 
the numbers earning at or above a certain wage. ‘The thin 
angular line on the adjacent page represents these numbers, 
entered for every 10-cent group. This plan is especially useful 
for irregular figures, like this wage-group, for the line must 
always tend upwards from the numbers earning the highest 
wage to the numbers earning at least the lowest. The diagram 
is also at once adaptable to the graphic method of finding the 
median described on p. 106. 

The irregularities shown by the thin line do not arise from 
any law of wage-grouping, but are due to the accidents of obser- 
vation ; if we regard these returns as samples out of a much 
larger unregistered group, we may suppose that a smoothed 
curve will indicate approximately the form which would be 
obtained, if our returns were complete. To smooth this figure, 
draw a freehand line passing as near the points as possible 
without abrupt changes of curvature, as in the annexed diagram. 
A new approximation may be made for the median, quartiles, 
Graphic method etc *> by drawing horizontal lines through the points 
of finding the on the vertical scale corresponding to half, one - 
median. quarter, three-quarters, etc., of the workers ; from 
the points where these cross the smooth line, draw vertical lines 
to the scale of dollars ; the points on the scale so obtained are 
the median (quartile, etc.) wage. 


The results obtained are : — 

Given on p. 70 
By method of p. 106, used 
in annexed diagram 
From smooth curve in an- 
nexed diagram - 
By method of interpolation, 
p. 227 - 


Median. Quartile. Quartile. 

$1.49 

$1.49 $1.16 $2.12 

$1.51 $1.15 $2.13 

$1,536 




Graphic methods of determining the median and 

MODES. 



Daily Wage. 
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This method is not, however, one of great precision ; a very slight 
. change in the curvature of the smoothed line would jmake more 
difference than those shown between the second and third lines 
in the above table. 

This method is useful for determining the mode approxi- 
mately. It will be remembered that the difficulties in doing 
this before arose from the uneven distribution on _ ., ... 

the two sides of the mode, and in the displacement of finding the 
of the mode by the adoption of a second system of mode * 
tabulation. The first of these difficulties entirely disappears 
in the graphic method, while the second is diminished, for 
the displacement now only depends on the slight possible 
variations in the curvature of the smooth line. The mode is 
clearly the position where the greatest number is added, in the 
present method of representing the figures : that is, the mode is 
where the line, angular or smooth, is steepest. On the smooth 
curve the maximum steepness is where the tangent crosses the 
curve, — in mathematical language, at a point of inflexion. This 
can be determined mechanically by placing a ruler to touch the 
curve, and turning it round the curve till it crosses it. On the 
annexed figure this occurs in the interval between $1.10 to $1.40. 
A more complex method of determining both mode and median, 
is discussed in Chap. X, pp. 227-8. 

This graphic way of finding these means has two great 
advantages. It can be applied to numbers which are given 
at irregular intervals of graduation ( e.g ., 30 at 30s. 6 d. f 40 at 
30s. 8 \i., 35 at 40s. id ., etc.) as easily and by exactly the same 
construction as to more regular returns; and if the smooth 
efirve is carefully drawn, the number of modes can be seen at a 
glance and the individual importance of each can be estimated. 
In the annexed diagram, the curve is concave to the base line 
ffom$.3oto about $1.20, convex from about $1.20 to $3.15, 
concave till $3.40, and then convex till the end. The points of 
inflexion or the modes are where concavity gives way to con- 
vexity. Hence there are two modes, of which that near $3.4 
is of -the less importance. 

A large class of diagrams may be passed by with a few' 
words. Writers and lecturers frequently use points, lines, 
triangles, squares, circles, even pictures, of dif “ Pictorial 

ferent sizes to assist the presentation of the <•*««'»”>»• 
relative magnitude of numbers. These have their use for 



140 


ELEMENTS OF STATISTICS 


popular lectures and hand-books, but do not add anything to 
the significance of the figures. Collections of these may be 
found in the second volume of Gabaglio's Teona Generale della 
Statistica, and in M. Levasseur’s La Statistiqu$ Graphique in the 
Jubilee Volume of the Royal Statistical Society. 

Of these one group may be signalled as of practical use. 
Rectangles may be used to express three quantities : one side 
to represent price; the adjacent side, quantity; and the area, 
value : or number of houses, average number of inmates and 
population : or number of hours’ work per week, average output 
or hourly wage, and total output or weekly wage. The figures 
on the annexed page show the limit to which this method can 
be usefully pushed. 


Representation of Three Facts by Rectangles. 


Imaginary budgets of an artisan and a labourer, showing amounts 
spent weekly on various commodities, and number of hours' work 
necessary for each amount. 


So 

45 

40 

35 

^30 

0 

r s 

1 20 

z 

*5 

10 

5 

o 


Per week, 
£ 1 • 13s- 4 <b 


> r m t: s 


' ye. 


i GOD 


8d. per hour. 


60 


50 


X 

a 

p 

z 


Per week, 

o. 


Al l L > E 

•4 


30 




4d. per hour. 


The horizontal scale 
represents pence per hour. 
.125 inch « id. 

The vertical scale re- 
presents number p{ hours 
per week. . 1 inch =* 2 hours. 

The areas represent 
amounts spent, and the 
whole rectangles show the 
week’s wages on the sams 
scale. 1 sq. in. ~ 1 3s. 4d. 1 


A Joint Committee on Standards for Graphic Representa- 
tion has since 1916 worked at the best methods for presenting 
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statistics graphically, and has made many useful suggestions 
which may produce uniformity in treatment and avoid 
errors. 

• The use of statistical maps can only be afforded a brief 
notice here. Any numerical quality of a population, its 
density, average income, average taxation, may c>rtogram> 
be shown district by district by suitable markings, 
or 'Colours. Of these the most useful method is to choose one 
colour, say blue, for excess above the average; another, say 
red, for defect. Divide the districts in nine groups, say more 
than 7 pqr cent., 5 to 7 per cent., 3 to 5 per cent., 1 to 3 per 
cent, above the average : these should be marked by four shades 
of blue, becoming lighter as the average is approached ; within 
I per cent, of the average, above or below, should be white; 
and shades of red, gradually becoming darker, will show the 
remaining grades below the average. Care must be taken 
not to adopt too many grades. For examples of this method 
see Booth's Life and Labour of the People , maps ; the Statistical 
Atlas of the Xlth Census of the United States; the Statistical 
Atlas of India ; and the maps in M. Levasseur's paper just 
mentioned. A cheap and very effective method, by which 
similar results are obtained in black and white only, may be 
seen on Plate P (misprinted 2) in that paper, and in the excellent 
chapter on Graphic Representation in Bertillon's Cours 
lllmentaire de Statistique, p. 133 seq. 

A common defect in maps of this class arises from the fact 
that records generally relate to administrative areas, while the 
phenomena to be represented are independent of these. An 
example will make this difficulty evident. If a map is made of 
England in 1911 colouring the counties according to the 
dgnsity of population, Cumberland will be marked by the 
• colour appropriate to 27 persons per 100 acres, and Northumber- 
land by that for 53 persons. The colour will change abruptly 
in a moorland region where for many miles the population is of 
a uniform sparseness. This difficulty can be overcome by 
either of two methods. Minute divisions, e.g. t civil parishes, 
can be taken as the units, and each shaded in black only, 
the amount of pigment increasing with the population ; or the 
population ,can be marked in situ as accurately as the data 
allow, a dot of uniform size being placed for each 100 people, 
with a modification of method for dense districts. A map of 
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this kind is reproduced in Professor Secrist’s Statistical Methods, 
1917, p. 189. 


2. Historical Diagrams. 

f /' 



Perhaps the chief use of diagrams is to afford a rapid view 
of the relations between two series of events. 

The different cases that occur are best illustrated by 
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examples. The simplest is when we wish to compare two sets 
of figures expressed in the same unit, say £ sterling ; comparison of 
and the simplest of these when we wish simply to fl * urc# 
compare a whole and its parts. 

On the adjacent diagram the upper line shows the annual 
total gross revenue of the United Kingdom {Statistical Abstract, 
1906*); the next line, that part which comes illustrated by 
from inland revenue and customs, the difference the revenue, 
being mainly composed of post office receipts. The principal 
heads of revenue are customs, excise, income tax, and post 
office. These are shown by suitable lines for each year, each 
line being independent of the other, and all having the same 
base line and being on the same scale. This method is greatly 
preferable to the alternative one of drawing a second line 
representing the total less customs, a third the total less customs 
and excise, and so on, because the eye is then quite incapable 
of judging the relative movements of the separate items. 
The figure shows at once the main features of the course of 
revenue. The increase has been rapid but irregular. The 
rapid growth in 1854-57 was not at once maintained, but the 
figures for the 6o’s are at a far higher level than those for the 
5o’s. A rapid fluctuation in 1870 is followed by a more regular 
growth almost unchecked till 1887; and then, after a short 
stationary period, there are great increases in 1895, and between 
1898 and 1903. Nearly the same remarks apply to the line 
showing inland revenue and customs. If we look for the parts 
of the revenue that have borne the increase and change, we see 
that prior to 1900 receipts from excise had increased most, 
next those from the post office, and next those from the income 
tax, while the customs had diminished. Each line has its 
distinctive features. The post office payments show an almost 
Tegular growth. The income tax fluctuates violently, bearing 
the brunt of nearly all the rapid changes in the total, especially 
in 1856 and 1870, and 1900-02. The excise line shows a 
moderate increase till 1870, a sudden jump to 1874, and a very 
slow growth since that date. Customs, on the other hand, 
have to some extent taken an opposite course to that of excise, 
so that the total from the two had not changed very rapidly 
prior to 1900. At the top of the page a new base line is taken, 

• This diagram cannot be carried later owing to a change in the book- 
keeping of Imperial and Local taxation accounts. 
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Revenue of the United Kingdom. 

Unit, in all columns, ,£10,000. 


Year ended 
31H March 



Property 
and Income 
Tax. 

Post and 
Telegraph. 

560* 

216 

560* 

228 

550 * 

237 

570 * 

237 

580* 

252 

1,070* 

237 

1,520* 

281 

1,620* 

292 


292 

668 

320 

960 

33 * 

1,092 

340 

1,036 

35 * 

*,057 

908 

365 

3 8 * 

796 

AIO 

639 

425 

570 

447 

618 

463 

862 

466 

1,004 

477 

635 

527 

908 

543 

750 

583 

569 

700 

43 1 

6/9 

411 

7*9 

528 

730 

58 * 

746 

871 

757 

923 

777 

1,065 

830 

994 

863 

1,190 

901 

1,07a 

947 

1,200 

1,516 

966 

989 

*.590 

1,028 

*.444 

i, 060 

1,270 

i.iiS 

I » a 77 

*,*77 

*» 3 3 5 

1,226 

1,381 

*,263 

*.347 

1,288 

1,520 

*,30* 

1, 560 

*,334 

i,6io 

1,422 

*,665 

*.477 

*.725 

*, 5*8 

1,800 

*,875 

*,586 

*,665 

3,69a 

*,>25 

3 , 48 o 

*,779 

3,88o 

*,838 

3,080 

*. 9*5 

3 , *25 

*,993 

3**35 

3,101 


* These ifures cannot be given accurately within jtioo.ooot 
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and the number of pounds per head of the population is shown 
year by year ; it will be seen that the only important increases 
were between 1853 and 1857, and f rom 1898 to 1903. 

• So far we have found no more difficulty in the choice of 
scales than previously when dealing with only one line, for all 
the lines on the larger diagram indicate millions choi«of 
of pounds, and when the unit is £ 1 , a new base ,econd scale - 
line has been adopted. But we may wish to show the change 
of population on the larger diagram. It is necessary, as 
we have already seen, to use the same base line for the two 
quantities to be compared; but we may choose any point for 
the beginning of the new line, adapting our vertical scale, for the 
eye can judge the proportionate changes wherever the line io 
placed. It is best to decide this point by defining the problem 
on which the comparison should throw light . If it is required to 
compare the growth of revenue with the growth of population 
since, say, 1850, we should start the new line at the point on 
the 1850 line where the revenue curve begins, and we can then 
see how the lines intersect one another again and again. , Since 
1850. however, is an arbitrary date, this plan lacks definition, 
and it is more logical to make the lines coincide at the most 
recent date given, with which any previous date can then be 
compared. On the diagram the line is drawn on such a scale 
that it lies fairly close to that for inland revenue throughout 
the greater part of its course. 

The next diagram, facing p. 146, introduces further diffi- 
culties as to the choice of scales. The object of the figure is to 
show the relations between quantity, value, and Compansonof 
price of imported wheat, and population. The line quantity and 
A is first drawn on a scale chosen so as to throw its 
fluctuations into relief. Population is at once brought into 
relation with this by calculating the amount per head year by 
year. The line C to represent these figures is drawn on a 
different scale, chosen so that the line shall not cause confusion 
by continually crossing any of the others on the figure. If the 
figure was too full this could be treated as on p. 142, the revenue 
per head. The same scale of years must be used, and for 
simplicity of calculation and appearance, 100 lbs. consumed per 
head is measured by the same vertical distance as D euiis of 
10,000,000 cwt. imported. A and C refer to the constr « c tion. 
same quantities, and therefore similar lines are used in both 

L* 



I46 f ELEMENTS OF STATISTICS , 

cases. The line B represents value and is shown by a broken 
line. For this line the choice of scale is more difficult. In the 
diagrams which follow, instances will be shown where special 
methods are used to bring out specific comparisons. Here this 
is not necessary, and a scale is adopted which brings the lines 
A and B into near relation, and shows the fluctuations of B, 
while the figure is made simple and intelligible by the repre- 
sentation of £ 20 by the same vertical distance as 20 cwt. 

The line D shows the changing price of wheat as deduced 
from columns A and B. The scale is chosen so that it boldly 
crosses the lines A and B; thus its fluctuations are clearly 
shown, and the numbers are easily seen, for 2 s. per cwt. is 
represented by the same vertical line as 10,000,000 cwt. If the 
figure was accurately drawn, lines A and D would lie one over 
the other in 1876-77 ; they are therefore shifted very slightly 
horizontally, and clearness is preserved without the general 
impression being vitiated. 

The lines in the diagram, elucidated by the table, suggest 
many characteristics and changes which call for explanation 
Movements by students of economic history. The con^ump- 
needing tion of imported wheat per head increased for 
explanation, thirty years to 1895, and was then lower for some 
years. The quantity imported shows violent short-period 
fluctuations. The price after violent fluctuations from 1862 
to about 1878 fell for seventeen years with Kttle intermission. 
Here no doubt is shown the effect of many causes : an increasing 
population, the fact that wheat imported is complementary to 
the home product which is dominated by the English weather, 
the variation of harvests all over the world, political events, 
the fall in the value of silver, the development of communica- 
tion and transport, etc. The function of the diagram is to 
show the general trends and the dates of change, but of course 
one cannot from it ascertain the causes. 

As regards the choice of markings for different lines, the 
chief rule is that lines which cross one another, unless very 
acutely, must be marked differently. The second rule is to 
mark similar quantities in similar ways. 

If it is possible to use more than one colour this principle 
can be easily carried out.* 

• See Wages in the Nineteenth Century , by the present author, diagram 
lacing p. 90. * 
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Importations od Wheat and Wheat Floor, 1862 to 1906. 

Wheat flour is rechined at its equivalent in grain. 


• Year. 

A. 

Total Quanti- 
ties Imported. 
Unit, 

100,000 cwt. 

B. 

Total Value 
Imported. 

Unit, 

£100,000. 

C. 

Quantity retained 
per Head of the 
Population. 

D. 

Average Value of 
Wheat and Wheat 
Flour in Shillings 
per cwt. 

1862 

500 

286 

191 lbs. 

II.44 

18630 

309 

155 

n8 „ 

IO.03 

1864 

288 

135 

l °9 „ 

9-37 

1865 

258 

124 

97 „ 

9.61 

1866 

294 

168 

no „ 

II.43 

1867 

391 

285 

*44 „ 

,4-58 

1868 

365 

249 

*34 ,, 

13.64 

1869 

444 

233 

166 „ 

10.50 

1870 

369 

196 

*32 „ 

10.62 

1871 

444 

268 

,58 „ 

12.07 

1872 

476 

303 

168 „ 

12.73 

1873 

516 

344 

180 „ 

* 3-33 

1874 

493 

3°9 

*70 „ 

12.53 

1875 

595 

324 

203 „ 

10.89 

1876 

519 

279 

J 7 & „ 

10.75 

• *877 

635 

407 

212 „ 

12.82 

1878 

597 

342 

*97 „ 

11.46 

1879 

73 ° 

400 

239 „ 

10.95 

1880 

685 

393 

222 „ 

11.47 

1881 

713 

407 

229 „ 

11.42 

1882 

808 

449 

257 „ 

11. 11 

1883 

851 

438 

269 

10.30 

1884 

669 

301 

210 „ 

9.00 

1885 

823 

337 

256 „ 

8.19 

1886 

670 

261 

207 „ 

7 79 

x »7 

802 

3 M 

2 45 „ 

7.82 

1888 

804 

3*5 

244 „ 

7.82 

1889 

789 

3 11 

238 „ 

7.88 

1890 

824 

327 

246 „ 

7- 94 

1891 

895 

396 

265 „ 

8.85 

1892 

956 

37 * 

28l ,, 

7.76 

1893 

938 

308 

273 „ 

6-57 

S894 

967 

268 

277 „ 

5-54 

1895 

1,073 

302 

305 „ 

5-63 

1896 

996 

309 

279 „ 

6.21 

1897 

88 7 

330 

247 „ 

7*44 

1898 

944 

377 

259 „ 

7-99 

1899 

985 

330 

267 „ 

6.71 

1900 

986 

334 

266 „ 

6.78 

■ *1901 

1,01 1 

334 

270 „ 

6.60 

1902 

1,079 

360 

288 „ 

6.67 

1903 

1,167 

397 

309 „ 

6.80 

1904 

1,182 

4*5 

3 *o „ 

7.02 

1905 

1,142 

4*3 

296 „ 

7-23 
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The following table contains numbers for continuing the 
diagram. • 


Year. 

A. 

B. 

C- 4 

D. 

1906 

1,127 


~f 

290 lbs. 

7.01 


1907 

1,156 

440 

295 ,, 

7.61 


1908 

1,091 

454 

275 .. 

8.32 


1909 

1910 

1,132 

1,191 

516 

497 

* 84 » 

*96 „ 

9.12 

8.35 

t 

I9II 

1,120 

442 

276 „ 

7.89 


1912 

1,237 

520 

3°» » 

8.41 


1913 

1,225 

502 

296 .. 

8.20 



The general characteristics of a series in time are to be found 
in its trend and in the nature of its fluctuations, and such 
series may be classified as follows : — 

(а) With trend, in constant or gradually changing direction, 

Trend »nd and no fluctuations. Statistics of the population 
fluctuation*. c f a country are generally in this class. * 

(б) With random fluctuations ; that is, fluctuations of such 
a nature that when a movement (up or down) is recorded in a 
year it does not lead to any forecast as to whether the move- 
ment in the following year will be up or down. Ex. Annual 
statistics of rainfall. 

(c) With compensating fluctuations; that is, when an 
upward movement in one year is generally compensated by a 
downward movement in the following. Birth, death and 
marriage rates frequently show such compensation. 

(d) Undulatory ; that is, when after a maximum tfr crisis 
downward movements follow ^ne another for some years til\ a 
minimum is reached and then there are successive upward 
movements. General price statistics, and indeed that great 
mass of records which is related to the so-called commercial 
cycles, are of this nature. 

(e) Periodic ; that is, when every ten years or twelve 
months, or some other period, the sequence of ups and downs 
is repeated in the same order and (in some cases) the magnitude 
of the fluctuations is repeated. A seasonal example is given 
on pp. 159 seq. below. 

In (£>), (c), (d), and (e) a trend may be combined with the 


* There are also series where the records are equal over several years 
and then move abruptly to another level and there remain for a time. Standard 
time-rates afford an example of this kind. * 
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V 

fluctuations. We may also have random or compensated 
fluctuations superimposed on an undulatory movejment and 
a trend ; ripples on the waves of a rising tide. When we have 
a Jime series of records, it is very important to consider the 
general nature of the trend and fluctuations shown, in order 
to form a judgment of the near future. If fluctuations are seen 
to be random and violent, we shall not be disturbed by a low 
record and believe some remedial measures to be necessary. 
In the case of compensating fluctuations, we shall anticipate a 
high value after a low one. If the series is undulatory we shall 
be prepared for a deferred recovery after the figures have once 
broken fr&m a high value. 

3. Comparisons of Series of Figures. 

A. Before proceeding to the study of the next diagram, it will 
be well to define more exactly what is our object in comparative 
studies of figures, and to consider the means at our disposal. 

When dealing with two series of similar quantities such as 
the course of trade or population in two countries, we wish to 
see the general rate of progress (as can be done by Qu;Mita in 

smoothing the curve), the years of special increase, com P aris °o»- 
the dates of maximum and minimum, in fact to compare the 
three things that the eye can see — the increase, the rate of 
increase, and the dates of change of rate of increase. The most 
obvious way to do this is, to take the same scale and base line 
for both countries and the same unit of measurement ; but this 
method does not take us all the way. We can judge differences, 
it is true, and the additions in all the years in both countries, and 
vte can see the highest and lowest points and dates of change of 
r ite of increase ; but we cannot compare rates of increase. 
It is not easy to judge ratio, though a rough guess at it is 
t possible. Thus if the trade is very different in magnitude in the 
two countries, equal absolute increments will mean very different 
relative increments, and it is difficult to be always on one's guard. 

The remedy for this is to alter the arrangement of scales. 
Make a second figure, in which the unit shall be not a sum of 
money, but a percentage : let 1 per cent, of Eng- percentage 
land's trade, say in 1850, be the unit for the #calc4 - 
English line; and 1 per cent, of the trade of Germany, at the 
same date, for the German line. In other words, express the 
trade of botji countries as percentages of their value in a given 
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year, and draw lines to represent these percentages. Alongside 
the diagram two or more scales can be placed showing the 
absolute amounts of the trade of each country. Then the rates 
of increase will be comparable, equal increments representing 
equal percentages of the trade of each country in 1850 ; and, 
in addition, the dates at which either country gained ground 
relatively to the other can be easily picked out. The question 
whether absolute rates or relative rates should be studied 
is a very common one in statistics. Sometimes the absolute 
Absolute or magnitude should be known, as for instance when 
relative we want to estimate the effect of measures which 
progren. affect the well-being of special clashes, or the 

trade of special countries ; sometimes the relative rate, as when 
we want to watch the progressive increase of different industries, 
or to be on our guard as to future competitors. The two studies 
generally require two different diagrams though they may 
represent the same numbers. 

It will be seen that the chief difficulty lies in the choice of 
the year in which the quantities are to be equated ; this must 
be decided by the nature of the argument which the diagram 
Is to illustrate. 


We may compare the following numbers — 


Year 

1880 

1890 

1900 

A - 

220 

440 

33 ° 

B - 

160 

240 

400 






in three ways, shown in the diagrams on p. 151. 

In Figure 3 the fluctuations are seen as percentages of the 
values at the last date, and are thrown into better proportion 
than in Figure 1. It is frequently the case that the equating of 
quantities at the most recent date throws what are often small 
beginnings into their right proportion when viewed from the 
modem standpoint. The statements that the values ip 1880 
were 40 and 67 per cent, respectively of the corresponding 
present values, is in better perspective than the statement that 
the values in 1900 were 250 per cent, and 150 per cent, of the 
corresponding values in 1880 ; but circumstances must decide 
in each case which method is to be adopted. 
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IS* 


I. Expressed as percentages of value* 
in 1880. 


2 . Expressed as percentages of values 
in 1890. 


Scales * 

7. A. B. 

* 

ioo 440 3 20 

150 330 240 
IOO 220 160 

50 I 10 80 



Scales 

7. A. B. 


ISO 

660 

36c 

IOO 

440 

240 

So 

220 

120 



it bo 1890 1900 


1880 1890 


3. Expressed as percentages 
of values in 1900. 



. These points are fully illustrated by the annexed diagrams, 
the object of which is to analyse the progress of our trade with 
our colonies and with foreign countries, especially Illustration 
Germany. The first figure shows the total im- from trade with 
ports and exports, and the parts of each which Germany * 
are colonial $nd foreign, the scale in millions of pounds being 
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the same for all the lines. A line is also given for imports from 
Germany, •Holland, and Belgium ; these are grouped together, 
because it was not possible till 1904 to distinguish in the returns 

# a 


Imports and Exports,' 18621905. 

Unit in all columns, £ 100 , 000 . 



Total 

Import*. 

Total 

Exports 

including 

Re-Exports. 

Exports 

to 

British 

Possessions. 

Exports 

to 

Foreign 

Countries. 

Imports 

from 

British 

Possessions. 

Imports 

from 

Foreign 

Countries. 

Imporft 

from 

Germany. 
Holland and 
Belgium. 

186a 

2,257 

1, 66a 

454 

*,207 

653 

1,604 

279 

1863 

2,489 

1,969 

550 

*, 4*9 

847 

1,642 • 

283 

1864 

a, 749 

2,126 

557 

1,569 

937 

X,8l2 

332 

1865 

2,711 

2,188 

5*5 

1,673 

728 

1,982 

364 

1866 

a ,953 

3,389 

572 

1,817 

722 

2,231 

388 

1867 

3,753 

2,258 

534 

*,734 

607 

2,144 

373 

1868 

3,947 

2,278 

537 

*, 74 * 

670 

2,277 

379 

1869 

3,955 

2,370 

5*9 

1,85* 

704 

2,250 

405 

1870 

3,<>33 

2 , 44 * 

554 

1,887 

648 

*,384 

409 

1871 

3 , 3 io 

2,836 

556 

2,280 

729 

2 , 5 8 * 

469 

1872 

3,547 

3,*46 

656 

3,490 

794 

2,753 

455 

1873 

3 , 7 i 3 

3 , 110 

711 

2,399 

810 

2,903 

463 

1874 

3 , 7 oi 

2,977 

779 

2,197 

822 

3,879 

494 

1875 

3,739 

2,816 

767 

2,050 

844 

a. 895 

5*5 

1876 

3,753 

2,568 

701 

x,866 

843 

2,908 

5*6 

18 77 

3,944 

2,533 

758 

1,766 

896 

3,«>49 

590 

1878 

3,688 

2,455 

720 

*.735 

779 

3,908 

575 

1879 

3,630 

2,488 

665 

*,823 

789 

2,840 

543 

1880 

4,112 

2,864 

8*5 

2,049 

925 

3.187 

616 

1881 

3,970 

3,971 

867 

2,104 

9*5 

3,055 

58a 

1882 

4,130 

3,067 

923 

2,143 

994 

3 , *36 

658 

1883 

4,269 

3,054 

904 

3,150 

987 

3,282 

69a 

1884 

3,900 

2,960 

883 

2,077 

958 

2,94a 

646 

1885 

3 , 7 io 

3 , 7*5 

885 

1,860 

844 

2,866 

638 

1886 

3,499 

2,690 

822 

*,867 

819 

2,680 

609 

1887 

3,622 

2,813 

823 

*,990 

838 

2,784 

646 

1888 

3,876 

2,986 

9*7 

2,068 

869 

3,007 

684 

1889 

4,276 

3,156 

908 

3,248 

973 

3.304 

a. 715 

1890 

4,207 

3,283 

945 

»,337 

962 

3,245 

694 

1891 

4.354 

3,091 

933 

3,158 

995 

3 , 36 o 

716 

1892 

4,238 

2,9*6 j 

812 

2,104 

979 

3,359 

7*5 1 

*893 

4,047 

3,771 

786 

1,986 

9*9 

3 , *28 

720 

1894 

4,083 

3,738 

786 

*,953 

940 

3 , *43 

716 

1895 

4,167 

3,858 

761 

2,098 

957 

3,210 

739 

1896 

4,418 

3,964 

90 7 

*,057 

933 

3,485 

761 

1897 

4,510 

3,941 

871 

3,071 

94 i 

3,569 

760 

1898 

4,705 

3,940 

901 

3,038 

998 

3,708 

786 

1899 

4,850 

3,295 

943 

3,353 

1,069 

3,781 

834 

1900 

5 , 33 i 

3,544 

X, 02 X 

2,523 

1,096 

4 , *34 

86x 

1901 

5,330 

3,479 

*,*33 

3,347 

*,057 

4,163 

897 

X902 

5,284 

3,493 

1,176 

2 , 3*7 

1,069 

4,215 

950 

1903 

5,426 

3,604 

*,*95 

2,409 

*,*37 

4,289 

973 

1904 . 

5 , 5*0 

3 , 7 *o 

X,2o8 

3 , 5°3 

1,200 

4 , 3 *o 

?6a 

1905 

5,650 

4,076 

1,227 

3,849 

1,279 

4,372 

990 


from the two latter home manufactures from German goods in 
transit. It is not clear from this diagram which part of our 
imports has increased most rapidly. The three lines are, 
therefore, rfdrawn in the second diagram, on a percentage scale. 
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all the values being expressed as percentages of the corre- 
sponding values in 1905. It is now seen that imports from 
foreign countries and from our colonial possessions and India 
have marched together except during the period of the cotton 
famine, but the trade from Germany, etc., has increased more 
rapidly than either. If we had equated the quantities in 1862, 
the German line would have far outpassed the others by 1905 ; 
but the impression given would be erroneous as regards absolute 
quantities, for the increase was only £71,100,000 for the one, 
while it was £277,000,000 for all foreign countries. The 
remaining, diagram shows the relative rates of increase for 
Germany, Holland and Belgium, and the British possessions 
respectively, since 1870 

The International Institute of Statistics has considered the 
possibility of standardising historical diagrams for comparison, 
and resolved at its meeting in 1911 that the average of the 
figures for the years 1901-10 should be taken as the standard 
and that this average should be represented by a vertical height 
equal to the horizontal measurement that represented thirty 
year^. Diagrams drawn on this standardised scale can then 
readily be compared with one another whatever quantities they 
represent. It is not intended to prevent other comparisons 
being made (as, for example, those on the diagram facing 
p. 146), nor diagrams that represent series all expressed in the 
same units (£ or tons) being drawn with the same natural unit. 
The intention is that the standard should be adopted as the 
only foim where there is no reason to the contrary, and as an 
alternative form in other cases. Comparison, especially of 
international statistics, will be greatly facilitated if these rules 
are followed. 

m B. Series of figures are often compared graphically with a 
•view to discovering or illustrating causal relations. In such 
cases we do not study relative growth only as in c*u*ai 
the last diagram discussed, but look throughout reUtions - 
the period for any signs of resemblance in rates of growth, dates 
ol maxima and minima, or synchronism in any changes. The 
methods by which such comparisons are made are difficult, and 
need careful analysis. For instance, we may wish to consider 
whether an increase of the allowance for outdoor relief is con- 
nected with an increase of pauperism. In this case one line 
will represent money, the other the number of persons, and there 
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is no common unit; we need not calculate percentages, but 
having chosen any scale for money, we can make equality in 
any year by a simple adaptation of the scale for number. We 
shall wish to learn first, whether an increase or decrease *of 
money occurred at, or just before, an increase or decrease in 
number; and secondly, whether the greater the increase of 
one the greater the increase of the other. In order to show 
direct connection, we shall try to make one line lie as nearly as 
possible over the other. 

Draw a preliminary diagram in which both lines are entered 
on any scales ; this will suggest the resemblances to.be tested. 

„ Notice in what period the fluctuations are greatest ; 

Construction. ... , . ,° 

this in general should be the period to be taken, 
for it is here that the causal relations have had most play. 
If any other period is chosen for any special reasons, these 
should be made clear, for otherwise a critic may legitimately 
object that it is only in this period that the connection is 
distinct. There would be little difficulty in finding short 
periods in any two curves where the fluctuations synchronised. 
Take the averages of both money and of number ovef the 
period chosen, and draw a second diagram in which the scale 
for number is chosen by making this average for number equal 
to the corresponding average for money. Any correspondence 
between the two lines can be at once detected. 

There are many cases when the changes in the magnitudes 
which we regard as the causes are in the opposite direction 
inrtfn to those in the magnitudes which we regard as the 
relation.. effects. For instance, if we are comparing trade 
improvement with the number of unemployed, and make the 
construction just described, the maxima of the first line would 
synchronise with the minima of the second. Greater clearness 
can be obtained by inverting one of the diagrams, plotting out' 
the number employed instead of that unemployed, and then the 
changes should be in the same sense in both lines. 

In the above construction the lines will only lie one over the 
other throughout their fluctuations, if the' changes in one 

More complex quantity are in strict proportion to the changes 
relation*. j n the other, if an increase of io per cent, above the 
average, for instance, in the allowance for outdoor relief corre- 
sponded to one of io per cent, in the number of paupers. It 
is very rarg that such a simple relation is found ; ^all we can see 
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in general is that the maxima and minima occur at the same 
dates, that the fluctuations agree throughout in seqse in both 
series, and that the greater fluctuations in the one correspond 
tq the greater fluctuations in the other. 

Diagrams may often be used to suggest correlation between 
two series of figures, and this indeed is one of their chief merits, 
and they may be used to illustrate arguments on Useof 
the subject, but at this point their utility ends, for dia * rMM - 
they cannot be made to prove much. Causal relations are very 
difficult to establish, and the original figures must be critically 
consulted when theories are to be brought to the test. 

We have not yet exhausted the power of diagrams for 
making such comparisons, but the following method must be 
applied only with great caution. Suppose that we More exact 
wish to ascertain whether an increase of 1 bushel in method, 
the quantity of wheat to be bought for a sovereign corresponded 
to an increase of 1*5 in the marriage rate per 1000, or any 
such strict numerical proportion. Draw a diagram representing 
the quantities of wheat, take the average for the period chosen 
for comparison, and write the scale so as to read 1, 2, 3 . . . 
busflels above or below the average . Draw no base line . Now 
enter a line to represent the excess or defect of the marriage 
rate from its average in the chosen period, on a scale such that 
1*5 in excess is represented by the same vertical distance as 
1 bushel. The closeness of the two lines would test to what 
extent the theory was valid. The danger of this method is, 
that Math no base line there is no possibility of judging the 
amounts of the changes relative to the totals. The insertion 
of the necessary two base lines would confuse rather than 
aid. 

It is clear from the preceding analysis that, by the choice 
• of scales and base lines, the points at any two dates may be 
made to coincide on any number of accurately drawn lines 
representing series of figures. 

The preceding paragraphs are completely illustrated by the 
adjoining diagram. 

In Figure I are given lines representing the price of wheat in 
shillings per quarter, the total of values of exports and imports 
divided by the population, and the marriage rate Illustration of 
per 1000. The scales chosen are simply those method - 
which are easiest to use, and throw the lines into proper relief. 



156 ELEMENTS OF STATISTICS ( 

Marriage Rate, Total Exports and Imports per Head of 
Population, and Average Price of Wheat per Quarter. 


Year. 

Marriage 

Rate. 

Total Export* 
and Imports 
per Head. 

S.verage Price 
of Wheat 
per Quarter. 



£ a. 

d. 

a. d. 

i860 

17.I 

13 0 

8 

S 3 3 

l86l 

I6.3 

13 0 

3 

55 4 

1862 

16. 1 

13 8 

0 

55 5 

1863 

16.8 

IS 2 

7 

44 9 

1864 

17.2 

16 8 

7 

40 2 

>8«S 

* 7-5 

16 7 

5 

41 10 

1866 

17.5 

17 14 

5 

49 11 

1867 

16.5 

16 9 

6 

64 5 * 

1868 

16. 1 

17 0 

6 

63 9 

I869 

15-9 

17 3 

9 

48 2 

1870 

16.1 

17 10 

3 

46 IO 

1871 

16.7 

19 9 

6 

56 8 

1872 

17.4 

21 O 

0 

57 0 

1873 

17.6 

21 4 

2 

58 8 

1874 

17.0 

20 II 

0 

55 8 

1875 

16.7 

19 19 

4 

45 2 

I876 

16.5 

19 O 

10 

46 2 

1877 

15-7 

19 s 

5 

56 9 

1878 

! 5 - 2 

18 2 

1 

46 5 

1879 

14.4 

17 l6 

10 

43 10 

1880 

14.9 

20 3 

3 

44 4 

l88l 

15.1 

19 17 

5 

45 4 

1882 

15-5 

20 8 

10 

45 1 

1883 

155 

20 13 

2 

41 7 

I884 

15-1 

19 4 

1 

35 8 

I885 

14.5 

17 16 

9 

32 10 

1886 

14.2 

17 0 

10 

31 0 

1887 

14.4 

18 11 

7 

32 6 

1888 

14-4 

18 12 

1 

31 10 

I889 

15.0 

19 19 

9 

HKH 1 

I89O 

*5-5 

19 19 

7 

31 II 

I89I 

15.6 

19 14 

0 

37 0 

1892 

15.4 

18 15 

6 

30 3 

1893 

14.7 

17 14 

9 

26 4 

1894 

15. 1 

17 n 

9 

22 10 

I 89 S 

15.0 

17 19 

3 

23 1 

I896 

15-8 

18 14 

1 

26 2 


The points in each scale for the same years are over one another, 
but the scales differ. The base lines need not coincide. 

We can see at a glance whether there is resemblance between 
the courses of these figures. There is at any rate a general 
M*niace rate correspondence between the fluctuations of trade 
a»i trade, and of the marriage rate since 1870, and possibly 
earlier. There are points of likeness between wheat prices 
and trade ; in 1870-73 both rise together, and fall in 1873-75 ; 
both rise in 1876-77, fall in the following two yegrs, and then 
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rise again ; both fall from 1881 to 1886 and then rise. There 
are also many cases in which the motions do not agree?, especially 
1862-64, and 1887-89. 

• If we look novf at the price of wheat and the marriage rate, 
which in the earlier part of the century used to be closely 
related, the one rising when the other fell, we see Matrix 
that there is no great resemblance either in this “ d wheat * 
or the contrary sense. In 1860-62 and in 1862-64 wheat rose 
and fell, while the marriage rate fell and rose ; wheat rose in 
► 1865-67, while the marriage rate was first stationary and then 
fell a little ; then it continued to fall in 1868-70, though wheat 
was falling also ; in 1870-80 the marriage rate shows one long, 
wheat two short, fluctuations. Since 1880, in years in which 
wheat fell, the marriage rate in general fell also and vice versa. 

Let us consider for a moment the possible links of connec- 
tion between these phenomena. When wheat was the chief 
object of expenditure of the working class, its com»ecti Pg 

price was the chief thing for them to consider; Unk# * 
and so when wheat rose the marriage rate fell. On the other 
hand, now that wheat is cheap and wages higher, a change in 
the price of the loaf is only of great importance to a minority ; 
it is now the general prosperity of the country, well indicated by 
the condition of foreign trade, that raises the marriage rate. 

When exports and imports are increasing in value, trade is 
stimulated, and in spite of rising prices, marriageable people are 
sanguine that the prosperity will remain and the prices fall ; but 
Vhen 4 &e prices fall, so do the profits and incomes, and marriage- 
able people are more prudent. For these reasons we may 
expect the marriage rate and foreign trade lines to resemble 
each other. 

Now the increase of the marriage rate corresponding to an 
•inflation of trade, and an inflation of trade to a time of rising 
prices in general, we shall find the price of wheat in particular, 
which is connected with the course of prices in general, rising 
when trade is inflated arid falling when it is depressed, and 
therefore rising and falling with the marriage rate. But since 
the price of wheat is influenced also by special causes, it will not 
always correspond to the state of trade, and still less to the 
marriage rate, with its former tendency to opposite variations. 

There ismo need then for surprise that the curves marriage 
rate and trade correspond ; that wheat and trade correspond, 
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but less closely ; and that wheat and marriage show a double 
tendency. The correspondence between marriage and trade is 
investigated on the diagram. That between wheat and trade 
should be done on an identical method. Mhrriage and wheat 
should be compared twice on different plans : first for direct 
correspondence, and then by redrawing the wheat curve with 
its base line at the top for inverse correspondence. 

To effect the comparison between the course of trade and 
the marriage rate, the following steps are taken. On examining 
construction of the two curves on the first figure, it is seen that 

diagram. the resemblance does not begin before 1869; 
the parts of the curves since 1869 should therefore be brought 
into close correspondence. The average marriage rate, 1869-96, 
is 15 5, and average imports and exports per head, £19. The 
marriage curve is drawn in the ordinary way; then with the 
help of a sliding scale the trade curve is put in, so that with 
the same base fine £19 falls on the 15 5 fine in Figure II. 

The result is that the curves are seen to rise and fall at the 
same dates, but not to the same extent ; for, while the lines 
keep nearly parallel from 1873 to 1879, the falls from the 
maximum being equal, after 1879 the trade line fluctuates 
further above and below its average than the marriage rate does. 

It remains to test graphically whether the changes are 
proportional to one another. An equation of scales may be 
Fin»i obtained by equating the mean deviation (£104) 
comparison. 0 f imports and exports from their average 1869-96, 
with the mean deviation of the marriage rate ( 72) fiv/m its 
average in the same period ; or roughly taking the same vertical 
scale to represent £1 of imports and 7 in the marriage rate. 
This is making the hypothesis that a change of £1 in the total 
trade per head synchronises with a change of 7 in the marriage 
rate per thousand. The scales so chosen are marked above 
and below the common average line in Figure III. 

It is now seen that the fluctuations since 1870 he more 
closely together in the two curves, but that this closeness has 
been obtained by the partial sacrifice of the years before 1870. 
A yet shorter period, 1879-93, would show a very close agree- 
ment; but so special a selection would vitiate any general 
argument. 

Our conclusion is, that since 1870 the causes which affect 
foreign trade have also affected the marriage rate" at the same 
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dates and in the same sense, and that the more marked the 
effects on the one, the more marked are the effects on* the other 
also, but that there is no law of simple proportion between them. 

‘Instead of making comparison of the deviations from the 
average of a period, it is legitimate and often advantageous 
to measure the deviations from a smooth curve, whether 
obtained from moving averages or by some other method. 
We are then ignoring the causes which have a gradual and 
permanent effect, and comparing the short-period fluctuations. 

• We return to this subject below (Part II, end of Chap. VI). 

Note . — The relations tested in Figure II may be represented 
by the equation - = j, and in Figure III by — — — g = c (a con- 
stant), where x and y stand for the value of trade and the 
marriage rate, and a and b for their average values, and c is 
chosen so as to make the average fluctuations of the two sets 
of quantities equal. By the method of least squares c could 
be chosen so that the correspondence should be closer than 
with the value given by the calculation in the text. 

4. Periodic Figures. 

We now come to the consideration of periodic figures ; that 
is, of figures which within a given period, in a year for instance 
when returns are monthly, reach maxima and „ 

. . *' . Periodic figure*. 

minima at assigned times, and show fluctuations 
recurring with regularity in successive periods. In physical 
phenomena, such as the sunrise, the same daily numbers will 
represent the phenomena, almost without change, year after 
year. In the case of the tides we find a link between the 
more rigid annual curves of seasonal phenomena, and the less 
marked periods of social statistics ; for the tides are subject to 
separate influences with periods of 24 hours, 24 hours 50 min., 
29 days, 1 year, and others, and the effects of these influences 
are often masked one by the other. In the weekly figures of 
the Bank of England, Jevons discovered monthly, quarterly, 
and annual periods.* 

In social and industrial statistics we usually find an annual 
period, combined with a general slow movement upwards or 


Sec Investigations in Currency and Finance . 
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downwards, and confused by an irregular period of about ten 
years, due to alternate inflation and depression of trade. The 
influences of these three movements on the resulting numbers 
can be investigated, and the general methods of examining 
periodic figures fully explained by the complete discussion of one 
example, viz., the monthly returns of want of employment of 
the Friendly Society of Ironfounders. For another example the 
reader is referred to Jevons’ essay, On the Frequent Autumnal 
Pressure in the Money Market ; * and for an exercise, to the 
monthly gazette wheat prices, where the gradual change of the 
shape of the annual diagram can be traced in relation with 
the increasing influence of harvests in all the quarters of the 
globe. 

These figures are specially suitable for showing graphically 
a double period, and the influences of rapid annual fluctuations 
General features and general movements of longer period on each 
of the figure*, other. Looking at the table on p. 161 along the 
lines for the several years, we shall see that there is always a 
fall in the middle of the year. Looking down a vertical column 
under any month, it will be seen that there is no generally 
marked tendency towards increase or diminution, for high and 
low numbers occur in the first as well as the last few years. 
The most noticeable feature of these figures is the alternation 
of groups of years of high and of low numbers. Percentages 
above io will be found in 1857-58, 1861-63, 1866-70, 1876-81, 
1884-87, and 1892-93. Let us choose for examination the 
period 1866-70. The figure for January 1866 is beh^ ,r the 
Januaries of previous years; those of February, March, and 
April are also low ; from May to September the figures are 
greater than those of 1865 or 1864 ; from October to December 
they are greater than those of 1863, 1864, or 1865 ; in December 
1867 they are greater than any previous year. Most of the 
figures for 1868 are higher than in the nine previous years; 
but from September 1868 onwards the figure is lower than the 
one twelve months earlier till September 1872. This wave 
of unemployment then lasted from May 1866 to September 
1872. 

Now let us watch the seasonal influence. In 1866 there 
was no fall in the summer except in April, and there was a very 


• See Investigations in Currency and Finance. 
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PERIODIC FIGURES. 


Number of Unemployed Ironfounders, expressed as percentages 
« of estimated total number of members, month by month : calculated 
from figures given in the Annual Report of the Friendly Society of 
Ironfounders, 1894. 
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rapid rise in December. In 1867 a fall from April to 
August was followed by a rapid rise for four months. There 

st is a fall from December 1867 to September 1868, 

infiuenc*. but a rise follows in October, November, and 
December ; since the rise does not' generally begin till after 
August, it will be seen that the general fall did not much delay 
the seasonal effect. In the next year, 1869, there is a fall to 
a lower minimum in August, but now the rise in Decerriber 
is very slight, next year the fall is very quick to August, but 
the seasonal rise is not delayed. From this it is clear that the 
seasons had their effect throughout the fluctuation except in 
the opening year 1866, when there was no fall, and that the 
rises in the autumn were very much accentuated. Almost 
identical remarks would apply to the period August 1875 to 
May 1881. In what month was the condition of employment 
1867-70 at its worst ? The greatest figure given is 22 6 per 
cent, in December 1867, but unemployment in December is 
generally greater than in any other month, and the figures 
for any of the following six months may be more unusual; 
the determination of the exact date will be best shown by 
diagrams. It may be mentioned that most of these remarks 
were suggested by Mr. Hey, the former secretary of the Iron- 
founders’ Society, who drew up these figures. 

If we now turn to the diagram, the following facts may be. 
noticed. The thick fine showing the annual average percent- 
Th* story from ages shows a downward tendency till 1857, fol- 
th. diagram, lowed by an abrupt rise and fall in 1858-63, then 
two years' rise to its original height, returning to a minimum 
in 1865 ; the next wave covers seven years, and is marked by an 
extraordinarily sharp rise in 1867, and a very low minimum in 
1872. The exceptional condition of trade in 1872 could not 
last, but the rise is very gradual to 1876, when the next cycle 
of trade is marked again by a six years’ wave; the rise is 
not so steep as in the former fluctuation, but lasts longer, and 
a higher point is reached : the fall is at about the same angle, 
and the minimum in 1882 is about the same as that in -1865. 
The next wave came before it appeared to be due, and lasted 
seven instead of six years, but was much more moderate, and 
again the rise was sharper than the fall. The minimum of 
1889 did not endure, and the figure ends with a suggestion 
that the maximum will be in 1894, but only it a moderate 
height, aiid the next minimum might be expected in 1898 
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or 1899, if causes similar to those which influenced earlier trade 
depressions were still acting. It may be found, in *fact, from 
the Board of Trade returns, that, taking all the trade unions who 
made returns together, the maximum month was December 
1892, and the maximum year was 1893 ; after this the fall is 
regular to 1897, and a trifling rise in 1898 is followed by a very 
low figure for 1899.* 

*In Figure 5 the diagram is inverted and greatly compressed, 
showing now the percentage employed. If the period 1876-82 
.is cut off by two vertical lines, readers may see how great were 
the amounts of labour lost to the country and wages to the 
members of the Ironfounders’ Society in those years. These 
figures show a want of employment due to special causes in this 
Society more than twice as great as in other Unions whose 
returns are available for the same period. 

In Figure 5 the annual averages are smoothed by the method 
explained above (pp. 136-7), a seven-yearly average f being 
taken to correspond to the general wave length. It will be 
seen that there is no very marked tendency up or down in the 
thirty-nine years, and that the smooth line is never far from 
the general average of employment, 917. 

The comparison of this diagram with that illustrating 
exports (p. 134) is very instructive. Some of the results may 
be thus exhibited : — 

Dates of 

Minima. Maxima of 

of Exports. Unemployment. 

*" ’1862 1858 and 1862 

. 1868 1868 

1879 1879 

1886 1886 

.0 1894 1893 

a 

The figures may also be compared graphically by the methods 
of the previous or following sections. 

The averages for the nineteen Januaries, nineteen Februaries, 

etc., "in the years 1855-73, and similar averages 

for the years 1874-93, and the whole period are 
given in the table and exhibited in Figures 2, 3, 4. influeoc *' 

• See Annual Abstract of Labour Statistics , 1895, p. 73, for various 
methods of treating these figures similar to those here discussed. 

f For smoo tiling and studying periodic curves, see Professor Poyn ting's 
paper in Statistical Journal, 1384, and Professor Moore's in 191&. 

M2* 


Dates or 


Maxima 
of Exports. 

Minima of 
Unemployment, 

1866 

1865 

1872 

1872 

1882 

1882 or 1883 

1890 

1889 
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When we calculated the annual averages just discussed we 
eliminated* by that process the seasonal fluctuations; by this 
new series of averages we eliminate the influences of particular 
years. If we took, for instance, all the November numbers out 
of a series of figures totally uninfluenced by the seasons, if such 
could be found, and compared these with the general average 
for all months, we should in the long run find just as many 
instances above as below this average ; but if the figures were 
influenced by the seasons, we should find a considerably greater 
number above than below, or vice versa. The greater the. 
seasonal influence, the greater would be this excess.or defect. 
Averaging numbers in this way tends to eliminate the non- 
seasonal causes, for by hypothesis the excesses and defects due 
to them will in the long run balance one another ; and except 
by averaging these cannot be eliminated, unless they can be 
actually calculated. The excess of the November average 
above the general average will be greater than that of October, 
if the seasonal causes exert more influence towards excess in the 
former than in the latter month, and the curve which shows 
these averages will show a resemblance to that which vjpuld 
be obtained if the non-seasonal causes were absent. It will 
be only a resemblance for two reasons : first, because in the 
comparatively short series of years with which we are generally 
obliged to be content, a very effective non-seasonal cause will 
leave its mark on the average, as may be seen in the table on 
p. 161 ; secondly, because seasonal and non-seasonal causes are 
often not independent ; a depression of trade is accentual ’y/ 
a sharp winter ; a bad season in a year of bad trade may increase 
the want of employment greatly and suddenly, while a good 
summer in a prosperous year may reduce it almost to zero. 
In the case we are considering the interaction of causes 
tends to exaggerate the seasonal maximum and diminish the* 
minimum; in other cases a compensating effect might be 
found. 

In Figures z, 3, 4 the curve for the latter half of the year 
is prefixed to that of the calendar year, because the character 
of the yearly waves is seen most clearly from minimum to 
minimum. It may be noticed that the wave in Figure 3 is 
less definite in shape and has a smaller rise and fall than that 
of the earlier period shown in Figure 2 ; it would appear that 
the seasons are losing their influence. * 

If there is a definite annual period, that represented by 
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Figure 4, it may be expected that a figure of a shape similar 
to this — 


will be repeated annually in Figure 1 ; it is shown well in 1864, 
i8§2, and other years. In the great majority of cases the yearly 
maximum is reached in December or January ; at The annual 
the end of 1858 the maximum is absent, but is W * T ** 
•replaced by a break in the rapidity of the fall; at the end 
of i860 there is a rise, but the spring fall following is checked 
by the general upward trend; similar remarks apply to all 
the great fluctuations. There is no doubt that right along the 
line we find at nearly equal intervals these pointed crests above 
the line of averages. 

The minima are not so conspicuous, for the pointed shape 
is absent, trifling causes bring them near the smoothed line, and 
they are easily masked by a general fall or are absent because 
of a general rise. In 1861, however, there is a distinct minimum 
in spite of the strong upward tendency ; the minima are very 
conspicuous throughout the fluctuation of 1865-70 ; and from 
1859 to 1888 the minima are fairly marked, except in 1876, 
1880, and 1881. 

The following figures show the effect of a stationary, rising, 
and falling average annual rate on the shape of the seasonal 
wave : — 

m. Season*] wave on stationary line of averages. 



Jan. Dec. | Jan. Dec. 


6, Seasonal wave superimposed on rising line of averages. 



Jan. • Dec. | Jan. Dec. 9 
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t . Seasonal wave superimposed on falling line of averages. 



Jan. Dec. | Jan. Dec. 


These figures are drawn by adding or subtracting t fee average 
monthly differences from the general average 

( • Jan. Feb. Mar. Apr. May. June. July. Aug. Sept. Oct. Nov. Dec. \ 

' +ri + 1-0 + *6 +•» -‘4 - 7 -n - 9 -'7 -‘4 ° + v 6 ) 

month by month to or from the positions shown on the straight 
lines joining the annual averages. On a rising line the spring 
fall tends to become horizontal and the autumn rise steeper; 
on a falling line the spring fall becomes more rapid and the 
autumn rise is checked. 

If this seasonal wave, added to the slower long-p f eriod 
changes, were the complete explanation of these numbers, 
Figure I (p. 162) would be entirely composed of modifications ‘ 
of Figures a , b , and c . Figure a is exemplified especially in 
1855-57, 1864-65, 1871-73; Figure b in 1860-61, 1866-67, 
i 877~78, 1883-85 ; Figure c in 1859, 1863, 1880-82, 1886-89. 

As explained above, the two sets of causes are not indepen- 
dent, and these figures are not reproduced exactly; but^e 
Elimination of resemblance is sufficiently close to make the 
fluctuations, following method of eliminating seasonal fluctua- 
tions partially applicable. Combine the monthly excesses and 
defects just given with the original numbers, by subtracting the 
excesses and adding the defects; this process should tend to 
produce a straight line thus : — 



-from figure 1 
. corrected figures. 


£ut the result is not more than a tendency, because of the 
unusual fcjll in January 1883, and it is difficult td find a perfect 
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example. This method is applied in Figures 6, 7, and 8 in an 
attempt to disentangle the seasonal fluctuations fron^the effects 
of the commercial c* isis of 1872, the depression of 1879, and the 
tym of the tide in *889. In Figure 6 it is seen that January 1872 
was the best month relatively, though the absolute minimum 
was not reached till June of that year ; from this it appears that 
January 1872 was the turning point of the great inflation, a date 
somewhat earlier than that generally given. The date of the 
maximum of 1879 is left unchanged by this process, and that of 
the 1889 minimum is only shifted one month. 

We have still to discuss the criteria of the existence of a 
period. In Figure 1 the optical evidence is sufficient to suggest 
the annual period, but it may be doubted whether criteria oi e*ut- 
an annual fluctuation would be suggested by a 
diagram representing wheat prices. It is clear that if the 
monthly entries of any returns whatever were averaged in 
months over any period of years, that the averages for January, 
February, etc., would not be exactly equal, even if there were 
no seasonal influence. The following diagrams show various 
averages : — 


Unemployed ironfounders 
as befcne. 



July Dec. June 


Wheat prices, shillings 
per quarter, 1862-76. 



J an. L>e c. 


Wheat prices, shillings per 
quarter, 1877-91. 



Average date of first Sunday 
in month, 1881-1900. 
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Of these the first three may be expected to be seasonal, while 
the last, wjiich shows the averages of the dates on which fell the 
first Sunday in 20 Januaries, 20 Februaries, etc., in a series of 
years, certainly is not. & 

The following simple tests may be applied to decide this 
point. If the period is in any way connected with the seasons, 
it will correspond to some extent to the ordinary weather charts 
of temperature, etc., which have a single annual maximum and 
corresponding minimum. Phenomena affected by the weather 
may also be expected to show a single maximum, nearly coin- 
ciding with the maximum or minimum temperature ; thus the 
maximum unemployed coincides with the minimum’ length of 
daylight and precedes the minimum temperature. In some 
cases a second subsidiary maximum may be shown, since, for 
example, an excessive death rate may be due to excessive cold 
or heat ; but even in this example further analysis would prob- 
ably show that the one maximum was for the old, the other 
for the young. Wheat prices may also show two minima due 
to the harvests in the two hemispheres. The " Sunday ” curve 
just given shows four maxima, and is not seasonal. More than 
one maximum is evidence against periodicity till a reason is 
found for their existence. 

The second test is to look at the serial diagram and notice 
how often the maximum occurs in the same month ; non-periodic 
probability causes will hide the maximum occasionally, but in 
**“• the long run one month will be predominant. In 
Figure 1 the maximum occurs in March and April twice each. 
in February three times, in January eleven times, and in Decern^ 
ber twenty-one times. The maximum is then generally in 
midwinter. The minimum is not in this case so well defined. 

The following table shows how this analysis can be ex- 
tended : — ' 

Times 
out of 39. 

The percentage of December is greater than that 
of the preceding November 33 

The percentage of December is greater than that 
of the following January - - - - 28 

The percentage of December is greater than that 

of the preceding July 33 

The percentage of December is greater than that 
of the following July 30 

The chances against so great a preponderance, if the seasons 



THE GRAPHIC METHOD 


169 

had no influence, are respectively about 65,000 to 1, 160 to 1, 
65,000 to 1, and 1200 to 1.* All the months may be«6eparately 
tested in the same way. This method by no means exhausts 
the evidence, for we have only considered which of two months 
is the greater, and not how great is the excess when it exists. 
On this point the reader is referred to the paper by Professor 
Edgeworth, On Methods of Statistics, in the Jubilee Volume 
of the Royal Statistical Society, p. 206; this should, however, 
be postponed till the mathematical treatment which follows in 
Part II has been studied. 

5. Logarithmic Curves. 

A serious flaw in the graphic method as used in the previous 
sections is that, when we are dealing with a series of increasing 
figures, though the totals year by year may be f« «r. P Mc 

increasing, we are compelled to represent equal re P n»«nuuo«i 
increments on these totals by equal vertical dis- 0 r * u °*' 
tances ; thus an increment of £20 on a total of £20 is repre- 
sented by the same vertical distance as an increment of £20 on 
a total of £2000. Thus in the annexed figure representing 
exports, the fall from £52,000,000 to £42,000,000 in 1815-16 is 
barely noticeable, though it is a fall of 20 per cent., and was 
connected with very great distress in the manufacturing dis- 
tricts ; while the fall from £305,000,000 in 1883 to £269,000,000 
in 1886 attracts attention immediately, though it is one of 
12 per cent. only. Again the increase of 34 per cent, which 
Auuk place between 1848 and 1850 appears insignificant in com- 
parison with that of 29 per cent, from 1870 to 1872. When we 
are attacking questions of causation it very frequently happens 
that we are more concerned to know the proportionate increase 
«tfian the actual increase. When we are considering the gradual 
growth of our foreign trade, or when we are comparing the 
growth of trade of two countries, a diagram like that annexed 
is likely to give quite a wrong impression of the struggle that 
marked the early stages. We need then a diagram not of 
quantities, but of ratios, where equal vertical distances represent 
no longer equal absolute increments, but equal proportional 
increments, that is, equal rates of increase. By the use of 
logarithms a universal scale can be constructed which serves 


• See Part II, Sect. I, infra. 
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this purpose. The non-mathematicai student can easily 
accustom .himself to the use of diagrams so constructed, by 
studying one where the actual amounts represented are entered, 
and noticing that whatever part of the scale 'he takes, doubling, 
halving, increasing by 20 per cent, and so on, are always repre- 
sented by the same vertical distances respectively. The con- 
comtruction of struction of a diagram on this scale is as follows : — 
» logarithmic Write down the numbers in the series to be repre- 
diagram. sen t e d • against them write down their logarithms ; 
on paper divided into equal squares mark at equal intervals on a 
vertical line numbers ascending in regular progression so as to 
include all the logarithms found; mark off the dates on a 
horizontal line; and on the scale thus prepared mark in 
the logarithms, instead of the original numbers. The table on 
p. 173 and the diagram facing p. 171 show the figures of imports 
and exports thus treated. On the right hand of figure 2 the 
position of the absolute numbers is given ; on the left the corre- 
sponding logarithms. A given vertical distance, 1 inch, 
represents the distance .301 on the logarithmic scale ; if we add 
this quantity to the logarithm of any number, we obtain the 
logarithm of twice that number for log a + -301 = lbg a 
+ log 2 = log 2 a ; for instance, if we increase the height of 
the position which represents £30 by 1 inch, we arrive at the 
position which represents £60. Again if we now add 1*59 of an 
inch, which represents -477 on the logarithmic scale, that is 
log 3, to the logarithm of 2 a, we obtain log 6 a, and we have — 

log 6 a = *477 -f log 2 a = *477 -f “301 -f log a, as above 
= ’778 -j- log a = log 6 -f l°g a ; 

that is, we arrive at the same position on this scale whether <ve 
go by means of two separate ratios or by a single compounded 
ratio. Thus a diagram drawn on this principle satisfies the 
necessary conditions that equal vertical distances represent the* 
same ratio in whatever part of the scale they are taken, and 
that any number of points can be entered without leading to 
inconsistencies. At the end of this section is given a table of 
the logarithms of 1 to 1000, correct to the third decimal place, 
which will be found sufficient for this purpose. 

Thus on the diagram given we can find at once that imports 
were doubled in value between 1811 and 1836, again between 
Examples of 1839 and 1853, again between 1855 and 1866, 
its use. and va j ue increased 40 per cent, be- 
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tween 1886 and 1899. Or we may notice that the excess of the 
value of imports over that of exports was 40 per cent, of the 
latter both in 1850 and in 1880 ; that the value of imports in 
1899 was thrice tlfat of exports in i860. 

If the eye has been • carefully educated to understand a 
diagram of this sort, if the fact that it is a diagram of ratios, 
noi of quantities, is firmly impressed on the mind, then the 
diagram answers perfectly the object of the graphic method, 
that is, it gives a true instantaneous impression of a complex 
series of facts. If, on the other hand, it is found that a true 
impression is not received, through inability to take the right 
mental position, then diagrams on the natural scale should be 
employed only, always with the recollection that they may give 
false impressions of ratio.* 

It is to be noticed that no base line should be given in 
diagrams of this class, otherwise a false impression is at once 
obtained. Notice further that, while equal verti- velocity * n <i 
cal differences represent equal ratios from any ««*»»«“>• 
part of the diagram to any other, instead of equal increments as 
on the natural scale, equal degrees of slope represent equal ratios 
of increase (equal accelerations), instead of equal additions in 
equal times as on the natural scale (equal velocities). On the 
logarithmic scale a line rising with convexity to the horizontal 
shows that the ratio of increase is growing, as in imports from 
1830-53 (if the line is smoothed), while concavity, as from 1854 
to 1873, shows a slackening ; but on the natural scale the line is 
convex almost throughout the two periods, showing that the 
■•actual increments were increasing all the time. 

* It would be useful, if space permitted, to offer several 
diagrams on both scales; for in many series of figures the 
differences exhibited by the two methods are very UtefulappU _ 
"instructive. One case may be signalised where the «•«»■> to index- 
logarithmic scale is specially important, that is, 
when the original numbers represent ratios, not actual numbers. 
Thus in Mr. Sauerbeck's well-known diagram, drawn on the 
natural scale, representing his index-numbers of prices, all the 
numbers included are percentages of their values in certain 
defined years. Suppose that ioo, 80, and 60 are the index- 


* Professor Marshall suggests a simple method of correcting this false 
impression in 'his paper On the Graphic Method of Statistics, in the jubilee 
volume of the Journal of the Royal Statistical Society, p. 257 seq . 
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percentage of unemployed with the marriage rate. In Fig. i, 
the numbers are shown on natural scales ; in Fig. 2 the averages 
over twenty-nine years are equated and the numbers are 
shown on a logarithmic scale. We miglht proceed as on 

Equation of p. 158, but to use an ' alternative method, the 

fluctuations, maxima and minima in various periods are written 
down as in the table on p. 175, and the averages of the fluc- 
tuations from maximum to minimum (expressed as percentages 
of the maximum) are calculated. It is found that a fluctua- 
tion of 8-4 per cent, in the number employed, in those trade, 
unions whose returns are accessible,* corresponds .to one of 
97 per cent, on the marriage rate. To investigate a possibly 
closer correspondence, assume that a portion of the number 
employed do not influence the marriage rate, and find what 
part must be subtracted before this 8-4 per cent, of the total 
forms as much as 97 per cent, of the remainder; the average 
percentage of members of the trade unions at work in the 
selected period was 95-1 ; 8-4 per cent, of this is 7-99, which 
forms 97 per cent, of 82*4. Thus 127, the difference between 
95-1 and 82-4, may be considered as not influencing the ques- 
tion, and subtracted throughout before logarithms are taken. 
This process would be replaced on the natural scale by equating 
the averages of two series, and drawing one base line so far 
below the other that average fluctuations would be represented 
by the same vertical distance for both series ; which process is 
exactly equivalent to that adopted on p. 158. Expressed 
algebraically, we are now investigating the equation — 

log (y — c) — log * = A, a constant, 

where c and k are constants to be so selected as to give 
the closest fit, and y and x are the quantities to be 
compared. 

In the adjacent diagrams. Fig. 1 gives the figures in the 
natural scale ; Fig. 2 gives them on the logarithmic scale, after 
they have been arranged so as to make average percentage 
fluctuations equal ; while in Fig. 3 the shorter period, 1880 -$>, 
is treated in a method precisely similar to that of Fig. 2. The 
actual numbers and logarithms are given on the next page. 

* The figures in columns 2 and 4 in the second table on the next page 
are taken from Mr. G. H. Wood’s paper on Some Statistics of Working Class 
Progress since 1860, Statistical Journal „ 1900, where a valuable logarithmic 
diagram will be found, illustrating many of the points of this section. 
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A critical account of logarithmic curves, strongly advocating 
their use is given by Professor Irving Fisher, in the Quarterly 
Publications of the American Statistical Association, June 1917. 
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CHAPTER VIII. 

ACCURACY. 

Introductory. 

There is not in existence a perfectly accurate measure- 
ment, physical or economical, just as there is no perfectly 
The nature of straight line or perfect fluid. We can best illus- 
meuurement trate the nature of economic measurements by 
considering that of physical. It is easy to weigh substances 
accurately to I gram : then by obtaining a good balance, we 
can, as our apparatus is improved, weigh accurately to a 
centigram, milligram, and one-tenth of a milligram; but for 
accuracy beyond this the balance fails us. Similarly in measur- 
ing angles, the naked eye can distinguish an object which 
subtends one-thirtieth of a degree ; with a sextant a measure- 
ment can be taken correctly to fifteen seconds of arc; the 
Greenwich astronomers can make observations correct to one- 
hundredth part of a second, but we again come to a point 
beyond which precision is unattainable. 

In such cases the result is stated as correct to a milligram, 
or whatever it may be ; in the same way we speak of an esti? 
mated sum of money correct to a pound. 

A task which has considerable resemblance to some statis- 
tical estimates, is the measurement of the parallax of the sun, 
physical Mid which determines its distance from the earth-, 
statistical During the eighteenth century astronomers esti- 
measurements. ma ^ e( j ^ as io', equivalent to 96,000,000 miles. 

As methods of observation and instruments were improved, 
observers began to agree that the whole number of seconds .was 
8, but gave various estimates for the first decimal figure. Since 
1865 there have been very few estimates which have not given 
8 as the nearest figure for this place (8-8'), while more recent 
observations agree in making the parallax from 876' to 878'. 
We may, therefore, consider that the distance is qow accurately 
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known to within 1 in 400. Notice in this connection, first, that 
the earlier observations have been subject to corrections; 
secondly, that better agreement has been attained as time has 
gone on ; thirdly, 'that neither absolute agreement nor ab- 
solute accuracy have yet b&en obtained. So it is with statistical 
measurements; we might instance the gradual settlement of 
the curve representing expectation of life, the measurement of 
the *f all in prices, and the development of wage statistics. 

Again in physical measurements, though we can sometimes 
reach a very high degree of accuracy, as, for instance, in the 
weight of a cubic foot of water which could doubt- Degree* of P o«- 
less be known correctly to one part in a million, in «we accuracy, 
other cases we are glad if we can measure to one part in ten, as, 
for instance, in the distance of the nearest fixed star from us, 
which is, roughly, from 34 to 37 billion miles. So in statistics 
it is something if we know that the total capital of the United 
Kingdom was between and 10 thousand million pounds in 
1885, or if we know that the average weekly wage of working- 
men in full work was from 21s. to 27s. in 1886. The weak point 
in such statements is that often when we have made an esti- 
mate* which we know to be inexact, we are not able to give any 
estimate of the limits of the error. We are not so definite as 
The Modern Traveller who 

“ . . . knew the weather to a T, 

The longitude to a degree, 

The latitude exactly.'* 

We are not able to say “ our estimate is 24s. 5 d . ; this is prob- 
ably correct within 3 d. t and it is not possible that we are as 
mtfch as 6 d. wrong ” ; whereas in physical measurements we 
can often give the result as correct to the smallest graduation 
of the instrument employed. 

» On the other hand, though we cannot obtain exactness, we 
can in many cases estimate to that degree of accuracy which is 
required for practical purpose. In common use The accuracy 
only a certain conventional accuracy is needed. * enef •“* De ^ 4 * 
Thfcs, to take some miscellaneous instances, the area of an estate 
is given in acres, roods, and poles, but not correct to square 
yards; the market prices of shares do not change less than 
tV>’ we keep the day, not the hour, of our birth; railway 
time-tables do not show seconds ; ocean steamers are timed to 
start at certain hours, not minutes ; height is measured correct 

,K2* 
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to one-tenth of an inch ; a hundred yards race is timed to one- 
tenth of » second. Similarly in statistical estimates, we seldom 
need that our results shall be accurate within one per thousand, 
or even I per cent. One per thousand of the working week is 
less than three minutes ; i per cent.’of the week's wage is only 
6 d. We do not care to know the population of London within 
ioo, the expenditure of the Exchequer within £1000, or the 
expectation of life within a day. It is often possible to attain 
practical accuracy within such limits. 

Definition of Error. — For purposes of measurement we 
may take the following definition : — The r dative error in an 
estimate is the ratio of the difference between the estimate and the 
true value, to the estimate ; the error is to be reckoned positive 
when the true value exceeds the estimate. 

Thus if the average weekly wage of agricultural labourers 
was in reality 14s., and we estimated it as 13s., our error would 


be — = i or 77 per cent. ; if we had estimated it as 

13 13 


15s., the error would be — — — = — — or — 6 6 per cent. 

I 5 I 5 


In algebraic notation, if u be the measurement of a quantity whose 

ft* ft 

true value is u 1 , then — — - is the error in the estimate, which we 

shall call e ; so that e — » and m 1 = u (1 + e).* e thus defined 

is the rdative error, while ue is the absolute error. 


In the nature of things, when we are dealing with errors, 
we do not know their magnitude; the most we can know 
sutwnmt oi is their probable and possible extent. We 
® rrort - might estimate, for instance, the percentage of 
unemployed in a certain year as 4-5, and add, from informa, 
tion in our possession (coming from a study of wage-bills 
or the reports of relief agencies), that we considered this to 
be within -5 of the fact; we should then write the number 
4*5 ± *5, meaning that the error in the estimate as defined above 


was unlikely to be more than 


4-5 


1 

9 ’ 


sponding absolute error being -5. In 


or 11 per cent., the corre- 
such a case we can also 


• It is sometimes more convenient to write u « u l (i + «), reckoning the 
error relatively to the true value. Then # = —* + ** approximately, and 
when $ is less than io per cent, we may take t » — e. * 
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give definite limits. The percentage unemployed must lie 
between o and 100 ; and if we could actually enumefate i per 
cent, of the working-class as out of work, and also 92 per 
ceftt. as in work, we should know that the number required 
was between i*o and 8-o per cent., and the maximum error in 


our estimate, 4*5, was or 78 per cent. Even this is 

more precise than the original statement, “ the percentage is 
4-5, error unknown.” By further investigation we might 
perhaps bring the limits of error nearer to each other, and 
decide that it was practically certain that the percentage 
required was between 3*5 and 4-5 ; then we ought to say “ the 
number unemployed is -04 . . . of the working-class, the 
estimate being correct to the last figure given. ” This statement 
is of the same nature as, “ The body weighs 15 lbs. 3 oz., correct 
to an ounce.” 

While, bn the one hand, it is clear that we cannot often 
obtain close definite limits to our errors, on the other we can 
very often see that some of the digits in a total are almost 
certainly right and others almost certainly wrong. Thus when 
we see in the Registrar-General’s Report that the population of 
the United Kingdom in 1895 was 39,124,496, the estimate being 
made from the census of 1891, and the increase calculated on 
the basis of the increase since 1881, we may be certain that 
the last two, or the last three, digits are no better than guess- 
work ; while the first two, or the first three, are correct. Thus 
the statement should read : Population was 39-1 millions, or 
39^124,000 ± 5000, or whatever figures our examination of the 
varying rate of progress' of the population led us to adopt, and 
such a statement is actually more correct than the previous 


one. 

* It is the custom in many classes of estimates to give the 
figures to the uttermost farthing. This is possibly right in 
official publications; for the duty of the office Neg)ect 
is to receive and tabulate returns, stating how •* miauU “- 
ancl whence they came, and it may leave to the economist or 
the statistician the task of deciding the degree of accuracy 
pertaining to them. But in summary descriptions and 
accounts, and in scientific estimates, it is not merely unneces- 
sary to give these last figures (both because they are not 
accurately knbwn, and because they generally have no impor- 



i 82 


ELEMENTS OF STATISTICS 


tance to the argument or significance to the reader), but it is 
positively inaccurate. The easiest way to avoid the inaccuracy 
is simply to state totals in so many thousands ( e.g ., the earth 
is 8000 miles in diameter), or if for any* reason more exact 
measure be required (as when we are comparing the equatorial 
diameter with the smaller one through the poles), the scientific 
way is to give the number as far as it has been fairly calculated, 
and to indicate its precision. 


Rules for Computing the Effect of Relative Errors. 

We may now give some rules connecting the errors of a 
complex estimate with those of the elements which form it. 

I. The error in an estimated sum is equal to the sum of the 
errors in the parts when each is multiplied by the ratio of the 
corresponding part to the sum. 

For if we estimate n quantities as . u n , and their sum 

as u, so that «=«,+«, ...«., and the errors of the 

Error In lum. ’.... 111 ' , , , ,, 

quantities are e t . . . e n , and that of the sum is e : 
then the true value of the sum is u (i+e), and the true values 'of the 
parts are m, (i+«i), m» (i+e,) . . . , so that — 

u ( I + g )= M i ( 1 + e i)+ w * +, 

but «=«! +k, + +; 

hence, by subtraction, ue—u x e x +«, e, -f- , 

and e=e x x U - +e t x— + +. 

u u 


The formula is easily adapted to the case where some of the 
parts are subtractive. 

To take an arithmetical example, if average working-class 
expenditure on food, clothes and rent was estimated in 1914 as 
25s., 5s. 6 d., and 6s. 6 d. respectively, while the true averagel 

2 2 

were 27s., 4s. 6 d., and 6s., so that the errors are -) — , , 


and — — , then the error in the sum of the three is- 


+4 of *5 

25 37 


-lof 5 - 5 - 


of - 5 = + 
13 37 


054 — -027 — -0135 


11 37 

+ ‘0135 or + 1$ per cent. 

We can apply the rule to the important case where we 
can estimate a great part of a required total with considerable 
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accuracy, while we are ignorant of a smaller part. Thus we 
may receive returns from several unions that 33,65ft are out 
of work, and have reason to know that the error is not more 
thftn 1 per cent., while some smaller unions do not send any 
returns ; we make an estimate for the smaller unions, say that 
1000 of their members are unemployed, and suppose a very 
large error, say § or 67 per cent. Then the error in the total is 
less than — 

JL of 53^52 -f ? of I0 °° = -029 or less than 3 per cent., 
100 34650 ^ 3 34650 * of-. 

• 

an error very much nearer that of the larger returns than that 
of the smaller. In the preceding sentence we say " less than," 
because we assume that we have taken an outside limit for the 
smaller errors. 


II. The error in the arithmetic average of several estimates i$ 
the sum of the errors of these estimates, when each is multiplied by 
the ratio of the corresponding estimate to that of the sum of the 

estimates. 

• 

For if m Xi m tt . . . m n are n estimates of quantities whose true 
values are m x (i + £i), m % (14- c%), . . . , the estimated and Error in 
true averages are respectively — average. 


fn l +m t + . . . m n and (i + gQ+m, (i4-f»)+ . . . +m n (j+e m ) 

n n 

and the error in the average is — 

m i ( I 4 lg i)~t~ w 2 + m 1 -j~m 2 - \- + 

• n n ^ 1 m 1 +(? 2 w a + + 

Wj+wia-F + Wj+m,-!- + 

n 


= e l X 


m 


S . m 


1 


where S denotes the sum of all the m’s. 


It is easily seen that no individual error can have much 
influence on the result, that the error in the average would be 
nearly of the same magnitude as one of the individual errors, if 
these were not very unequal and all positive or all negative, and 
that if, as is generally the case, some are positive and some 
negative (a point we shall consider presently), the error would 
be considerably lessened. 
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III. The error in a weighted average is the sum of (1) an error 
due to errors in the quantities , similar to the error of an un- 
weighted average, and (2) an error due to errors in the weights, 
which becor, es very small when the original quantities are nectrly 
equal. 


Let Wj, W, . . . Wn be estimated weights applied to « estimated 
Error in quantities M 1> M, . . . M n , and let the true values* of 
weighted tiie weights be W, (i+c,), W, (i+t,) . . . and of the 
*”'***' quantities be M, (1+0,), M, (1+ «,), . . . 

SWM 

Write M„= , so that M» is the estimated weighted average, 

and let M„ (i+E) be its true value. 


Then 


Mi*. E 


SW (i+.) M (i+e) 
SW (i+«) 


SWM 

SW 


= [SW.S{WrMr (i + er) (l + e,)}-SWM.S{W, (l+«)}]-f-SW.SW (i + r), 
where the suffix t denotes any selected quantity, etc. 

Then — 


E.SWM.SW (i+<)=SW.SW,M ( *+SW.SWrM< («+*«) -SWM.SW,*. 

Now suppose E, et, u to be as small as 1, and neglect' pro- 
ducts which are as small as -oi. 

E . SWM . SW=SW . SWrM^+SjW, (Mr . SW-SWM) «) 

. f _SW 1 M 1# , S{Wr (M,. SW-SWM) d\ 

• * SWM + SWM . SW 


The term involving et, the error in a quantity, is the same as 
that in Rule II., if W,M, is written for m,, etc. , 

The coefficient of « needs further analysis. 

Since SWM=M„ . SW, M«SW-SWM=SW. (M«-M„)=m> ( SW, 
where m'« is the excess of a quantity over the weighted average. 


- c WrM, lC WrW>, 
sWM ' ’’^SWM**' 


Hence the resulting error due to the errors in quantities involves 
the magnitudes M x , M t , etc., while that due to the errors in weights 
involves only the deviations of these quantities from their weighted 
average. These deviations are individually small if the dispersion 
of the quantities about their mean is small relatively to that mean. 
Further, the sum of the coefficients WfW 1 <=SWfM<— M v SW=o; 
if the errors in weights are all equal the resulting error in the 
average is zero, as is evident a priori , and if positive errors are not 
generally found with positive deviations (m 1 *) and r negative with 
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negative, and if large errors are not generally found with large 
weights (and vice versa), the sum of the terms tends to 

be small. 

• Hence the erroA in weights have an effect which not only 
diminishes from the same Causes as affect the errors in quantities, 
but also have coefficients which have a strong tendency to neutralise 
one another, unless the magnitudes of the errors, quantities and 
weights are associated with each other in special ways. Great 
errors are required in the weights, if many quantities are involved, 
to make an appreciable error in the average. In fact, the errors 
Hn quantities have so much more influence than those in weights, when 
once the weights have been reasonably estimated, if the quantities are 
not very unequal, that errors in the weights can very frequently be 
neglected . Several numerical examples of this principle were given 
in the section on weighted averages. 


Error in 
product 


IV. The error in a product is approximately the sum of the 
errors in its factors , due regard being paid to sign. 

For if f lt /,,... f n are the estimated factors, whose true values 
are/! (i+^i),/j (1+^2), . • - , then the error of the product 
_ fi(*+Cl) -ft (t + Cj ) - • - * —fi-ft- • • - 
ft- ft- • • • 

= (i+£i) . (i+^j) . . * — i=e 1 -\-e t + +tu, if we neglect products of 
two or more e’s. 

The es are equally likely, a priori , to be positive or negative. 
If two e’s are of different signs, they tend to neutralise one 
another. The error in a product may be great if all the errors 
the factors are of the same sign, even if they are small 
individually. 

For example, if we estimate that 100 men are earning on the 
average 25 s. each, while in reality there are 105 men earning 
r 26s., the error in the estimated total sum earned is, by formula, 

Jl + JL = . 09 . 

100 25 

If, with the same estimates, the real quantities had been 
105 and 24s., the error in the product would have been 

A _! = .01. 

100 25 


V. The error in a ratio is approximately the difference 
between the errors in its two terms, due regard being had to sign. 



ELEMENTS OF STATISTICS 


186 t 


For if u lf u t be the estimated terms, whose true values are 
Error in r»5o. u i ( x +^i) an d u % ( I +^t)» then the error in the ratio is — 
(i + *i) __ 

«1 ( i+g a ) U% & x+gi __ J _ e'l— e % 

tii i+e 2 ‘ i+e % 

«» 

= (<?i — ^*) — ) 

= ^i— if we neglect terms of • the 

second order in the e*$. 


If the errors in the terms are both positive or both negative , 
they tend to neutralise one another ; if they are also nearly equal, 
the error in the ratio becomes very small. 

We can apply Rule V. to the error in comparison of two 
averages of similar quantities estimated at different dates. 


With the same notation as under Rules II. and III., using 
m, e t c, for the quantities at one date, and m 1 , e l t c 1 , for similar 
quantities at another date, then the error in the ratio of the simple 
average of m, 1 ... to the simple average of m lt m % . . . is — 



Now if the quantities have not changed much during the period 
between two observations, the fraction will differ little from 


and so on. 

Neglecting these differences in comparison with the quantities 
themselves, a legitimate process when we are estimating ^he 
approximate influence of errors, we have — 


Error in the ratio of the simple averages = S 



If the two estimates have been made under nearly similar circum- 
stances, leading to similar chances of errors, e x x and e x are likely to be 
not only of the same sign, but nearly equal. 

Write d lt d t . . . for (e/— e x ), (e t l — e 2 ) . . . , and we have — 


Error = S . 



where the d } s may be small. 


The corresponding analysis for the error in the ratio of two 
weighted averages is too complicated to be given here ; * but 

♦ It will be found in the Statistical Journal , 1911, pp. c 85 seq. f and in a 
modified form in Part II, Appendix, Note 7. 
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using the principle that errors in weight are less important than 
errors in quantity, which applies with slight modifications, we 
may use the formula just given for the first approximation to 
the error in the ratio of two weighted averages. This formula 
may be put in words : — * 

VI. The error in the ratio of two averages of similar series 
of quantities, estimated at different dates, is approximately equal 
to the sum of the differences between the errors in the corre- 
sponding terms of the two series , each multiplied by the ratio of 
Hhe latter of these corresponding terms to the sum of all the terms 
at the latter date. 


This rule is so important that it will be worth while to 
illustrate it by an example, in which a further Error in 
quantity will be introduced. 


comparison of 
averages 


If in each of two years we are able to estimate, as in our example 
under Rule I., one part of a total more accurately than another part, 
we can use the following formulae : — 

First Year. Second Year. 

Estimated numbers or weights w ; error e ; w l ; error c 1 
Estimated average income, or 

quantity - - - m x \ error e x \ w/; errors 1 

Estimated number, less accu- 
rately known - - - rw ; error in r, p ; r l w x ; error in r l , p 1 

Estimated income - - w 2 ; error e t ; m 2 l ; errors 1 

e x and e x x are, by hypothesis, less than e % and e 2 l . 

Error in average for first year — 

— w (i+t) ,m x (i+^-f r (i+p) . w (i+c) . m, (i+<? 2 ) __ wm l +rwm t 

w (i-| -c)+r (i+p) w (1 + *) w+rw 

wm x -\-rwm t 

w+rw 

~ w i 1 rw, r m 2 — w, 

W|+m g ' 6% m x -\-rm % p i+r * m x -\-rm % 
if we neglect products of e and p. 

Here the errors, e t and p, connected with the less accurately 
known part, are each multiplied by r, the ratio of the weight of 
that part to the weight of the better known part, p is multiplied 
by m,-— m,, which in many cases is small, while e lt the remaining 
error, is by hypothesis small. 

If for simplicity of argument we assume that the ratio of the 
unknown paft to the whole (but not the error in estimating it) 
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has remained unchanged, and also that the ratio of the estimated 
average intomes of the two parts has not altered, we have for the 
error in comparison— 

• mi Tr ^ + + ^ Wr ■ 

Thus in estimating the change in average wages of Scotch 
agricultural labourers, we have figures similar in charactef to 
the following : — 

1867. Married Ploughmen. l8 * 2 ' C N °^ rol ^ 


Estimated number - 1,000 

Average income, 

£36 

1,200 • 

£49 

O 

0 

Supposed true number i ,o i o 

»» » 

35 

1,220 

48 

O 

0 

Farm-Servants. 






Estimated number - 200 

Average income — 

240 





Money - 

£*t 


£*7 

5 

0 


Estimated value 







of board 

13 


*4 

0 

0 


Total 

£M 


*f 4 i 

5 

0 

Supposed true number 220 

Total income - 

£37 

240 

£47 

0 

0 


Here u»=i,ooo, m,=36, r—\, m,= 34, if l =i,200, »»,*= 49, r 1 ^, 
m « 1= 4 I i> «— I hs> e l — ~TS> p— e *~Tl> « l — fV> e l 1== '~TV’ 


P l =~rr. t* 1 - 


. 28 
T«nr- 


Here it is supposed that we have overvalued the income of 
the married ploughmen, and undervalued that of the farm- 
servants in both cases. We suppose, as is the fact, that the 
value of the board and other perquisites of the farm-servants 
cannot be estimated with precision, and that the proportionate^ 
numbers in the two classes are not accurately known. 

Substituting in the above formula we find that the error in 
the estimated ratio of the average incomes of the two classes 
together in the two years is — 

f. 

+ *0062, due to errors in estimates of income of ploughmen. 

+ 0081, „ „ „ servants. 

+ *0008, „ „ ratios of the # numbers in the 

two classes. 

% 

Thus the last error, due to weights, is very small, and the 
second error, due to ignorance of the value of board, is reduced 
by the smallness of the number employed to a magnitude 
comparable with the first. 

The whole error is, therefore, by formula + *0151. Going 
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to the actual figures, we find the estimated ratio of the second to 
the first to be 1*3376 to 1, and the supposed true ratio to be 

1*^529 to 1 ; that is, the error is = + *011. 

The difference betweert the two methods of calculation is 
accounted for by the neglect of the less important terms. 

It is to be noticed that the error in the ratio of two quanti- 
ties* is not the same as the error which we might be inclined to 
estimate, the error in the percentage increase. Thus in the 
case just taken, the estimated and true percentage increases are 
33*8 and 35*3, and the relative error in the percentage increase 
is *045. For accuracy in such calculations, then, we require 
the error found by formula, according to Rule VI., to be very 
small. 

Another example is found from the well-known difficulty of 
estimating the relative importance of expenditure on clothing in a 
workman's family budget. 

The following estimates were used in the Report on the Cost 
of Living, 1918 (Cd. 8980, pp. 7, 18 and 23). 


Skilled Workmen, Average Weekly Expenditure. 



IQI 4 - 

1918. 

Ratio. 

Food 

27S. 

49s. 10 d. 

1 84 

Clothing - 

7 s * 

13s. 9 d - 

1-96 

Together - 

34s- 

63s. yd. 

1-864 


Here we take 10=27, r —ih> ^=1*84, ^,= 1*96. 

~ Suppose that r ought to have been taken as $, and bt lt m, as 

I *90, 2-10. 

Then *0326, c t —^— *0714, and p=y= *286. 

The resulting error by formula is — 

+ 0256, due to error in the ratio of food expenditure at the two dates. 
+ *0155, „ „ „ clothing „ 

+ *0030, „ „ ratios of the expenditures on clothing 

• and food. 

And the whole relative error is *044. 

The effects of the errors are in the reverse order of their 
magnitude, and the great error in the clothing ratio barely affects 
the second decimal place in the result. 

If, however, m t —m x had been larger, that is if the estimated 
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increase in expenditure on clothing had been much greater than 
that on food, the effect of this error would have been proportionately 
more. 

We return to the whole question of relative errors as illuminated 
by the theory of probability in Part II, Chapter IV, below. 

Biassed and Unbiassed Errors. 

In the consideration of all errors in averaging or comparing, 
it is important to distinguish two classes of errors, those which 

_ . are biassed and those which are unbiassed. The 

errors are 

biassed or un- difference can be made clear by illustrations. If a 
number of men are sent to investigate the condi- 
tion of an industry 'in different places, with a view of proving 
that wages are high, conditions of work healthy, and so on, they 
would probably, by examining only the best conducted works, 
and taking the wages only of the more skilled and regular work- 
men, produce an average for each town which would be too 
high. On the other hand, if there was no brief to be held, but 
the investigation was impartial, the commissioners would in 
some towns take too high an average, in others too low, 
according to their idiosyncrasies and to circumstances. In the 
first case, the errors would be biassed, all in the same direction, 
all tending to increase the average, whose error would be 
equal to the average error in the different towns. In the 
second case, the errors would be unbiassed, just as likely to 
be in excess or defect, and the more estimates made, the smaller 
would the resulting error be. The following figures would" 
illustrate this : — 





Unbiassed 

Estimate. 





t. 

s. 

1. 

Average Wages in District- 

-a 

24 

25 

24 

»» 

91 


b 

23 

25 

25 

» 

ft 


€ 

26 

27 

25 

99 

99 


d 

27 

28 

28 

»» 

91 


t 

28 


27 

Averages 

- 

- 

- 

25.6 

■ 

*S -8 

Errors - 

" * 

* 

* 

... 

llpjl 

1% 
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In measuring the distance of a bicycle ride on a mile-stoned 
road, it is found that the distances between successive mile- 
stones are not exact, but perhaps 50 to 100 yards out ; but 
it may be nearly a 3 likely that the errors will be in excess or 
defect, and the greater th'e distance gone the smaller will be 
the error, as defined. The errors are unbiassed. If, on the 
other hand, the bicyclist trusts to his cyclometer, he will 
ha\te to deal with a biassed error, for the instrument will not 
fit the wheel exactly, but will always register say 1800 yards 
when the machine has gone a mile. This is a case where the 
tnas can be measured and allowed for, whereas the unbiassed 
errors must be left to eliminate themselves. It is frequently 
the case that biassed errors are due to a wrongly graduated 
instrument ; unbiassed to separate faulty measurements. 

In the census returns, the fact that many women return 
themselves as younger than their birth certificate states, causes 
a biassed error in the average age of the population; the fact 
that people frequently return their ages at the nearest round 
number causes unbiassed error, and on the whole affects the 
average little. It is not improbable that in the Wage Census of 
1906* there was some tendency to obtain returns from the 
more liberally conducted establishments in some industries; 
this causes a biassed error in the average obtained. With these 
illustrations we can pass on to another principle 

. A A Relative impor- 

of great importance. Unbiassed errors are of little unce of biassed 
importance compared with biassed errors in a simple and e ^ o b r ‘^* cd 
estimate; but biassed errors diminish when the ratio 
of two similar estimates is taken. 

— For in an average of several quantities, which have biassed 
errors (rj x , yj % . . .) and unbiassed errors (e lt e t . . .), it is easy 
to see from Rule II. that the resulting error may be written 

^ ( e ‘ ' S~m) S (^ • Sm)" 

In the first term, the errors being unbiassed, many of them are 
positive, many of them negative, and they tend to neutralise one 
another ; in fact, if E is typical of the errors e lt e t . . . en, then a first 
approximation to the error arising from them in the average is 
2E* 

3 >/*’ 

* It is as likely as not that so great an error would be obtained. See 
Part II, Chap. lV r 
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Thus in the average of one hundred measurements, whose 
individual unbiassed errors are about the resulting error 


2 x 

may be no greater than - x — 


J 100 = 


There is no 


counterbalancing tendency, on the other hand, in the biassed 
errors; if each estimate was 10 per cent, in excess, then the 
average is also 10 per cent, in excess. When aiming at 
Great effect oi accuracy our principle always is to take care of 
Maned errors. ^e pounds, and let the pence take care of them- 
selves ; and it is quite futile to diminish the unbiassed errors, 
that is to increase the precision of our measurements, while a 
large biassed error runs through them all. If we do not know 
of the existence of biassed errors, which in reality pervade 
our estimates, there is no remedy ; if we do know of them, we 
are likely to obtain more accuracy by the most erroneous cor- 
rections for them than by neglecting them ; for when we make 
unbiassed corrections for our biassed errors, we reduce them to 
unbiassed errors, and then the more terms we include in our 
average the smaller is our resulting error. If, for instance, 
we find that the average weekly wage of agricultural labourers 
throughout the country is 13s., and by considering the circum- 
stances of the thousand returns which we may suppose led 
to this average we have reason to suppose that an error of 
is. would be typical of the unbiassed errors in them, then 

an error of - of - 7 ^ ==, that is only a farthing, may be expected 
3 v 1000 

to result in the average. We have here a totally illusive ' 
accuracy; the part of the labourer's income which we have 
not included, payments at haytime and harvest, facilities for 
piece-work, cheap rent for cottage and land and smaller 
perquisites, is not capable of exact calculation. If we omit 
all these entirely we shall leave an error in our average of 2s. 
or so ; but we make individual estimates of these additions, 
in all the thousand cases, though each estimate may be 2s. 
wrong, if there is no bias, the resulting error in the average 

may be expected to be ~ of that is only -rf. : our 

whole error now may be less than id., instead of 2s. In 
estimating the accuracy of published averages, these principles 
should be always borne in mind, and the possibility of biassed 
errors always considered. 
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When we are dealing with the errors of a ratio the case is 
quite different. The error of a ratio is approximately equal to 
the difference between the errors in its terms ; if Accuracy of 
rj t jj 1 and e , e 1 are tile biassed and unbiassed errors corap * r, ^ >ni * 
in the terms, then by Ruie V. (77 1 — rj) + (e 1 — e) is the error 
in the ratio. Now the unbiassed error (e l — e) is likely to be 
of nearly the same magnitude as either e or e 1 ; * if, as in the 

above example, e and e 1 are unlikely to be much greater than -, 

(s 1 — e) would be unlikely to be much greater than 3 , But 

(rj 1 — tj), the result of the biassed errors, will, if the bias in both 
terms of the ratio was in the same sense (positive in both, or 
negative in both), be less than the original errors. If we have 
made the estimates of both terms on precisely similar methods, 
if we have asked the same questions of the same classes of 
persons, included and omitted the same details on both occa- 
sions, we shall have made nearly the same errors of bias in both 
estimates. To return to our previous illustration, if we have 
made the glaring mistake of omitting everything except 
average weekly wages in the income of an agricultural labourer 
on both occasions, the only resulting error in the ratio will be 
that due to the change in the proportion that these extra 
payments bear to ordinary wages, which in short periods is 
likely to be small. Or, if we had taken summer wages as the 
average for the year in both cases, the error in the ratio will 
depend only on the change in the relation of summer wages to 
that average. Hence the error in the ratio of two estimates 
£T different dates of a slowly changing quantity is, if the 
estimates are made on similar methods, often much smaller 
than the error in either estimate singly ; for the unbiassed error 
*is little greater, and the more important biassed error is much 
diminished. We need not now know of the existence of the 
biassed errors ; they will disappear of themselves. If we are 
aware that there are biassed errors, and have any means of 
making fairly good estimates of them, it will be worth doing ; 
but we shall make a great mistake if we correct the bias in 
one year and leave it uncorrected in another. For purposes of 
comparison it is very seldom of much use and often of great 


* If E is the«probable error in s or s l t then E . Va is the probable error 
in their difference; see Part II, Chap. III. 
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disutility to make the later estimate more accurate than the 
Wi r . J#t ^ earlier. The error resulting from unbiassed errors 
mity in structure can indeed be diminished a little,* but the error 
ot Kriii return«. resu ^j n g f rom the more important biassed errors 

will only be increased. All Government officials and others 
who compile annual returns are in a dilemma : to make their 
annual statements accurate in themselves, they should always 
be straining after improvements, they should always be watch- 
ing for changes in the quantities measured and adapting their 
methods and tabulations to these changes ; but to make their 
annual returns comparable with each other, they , should be 
absolutely conservative, and cling to any mistakes they or their 
predecessors have made in the past with all the strength red 
tape can give them, being careful, however, not to add to the 
mistakes or make new omissions. The dilemma can in some 
cases be avoided ; for when an improved method is introduced, 
the tabulation can sometimes be given for a few years both on 
the old and on the new plans ; then when the difference intro- 
duced by the change is known, the earlier figures can be 
brought to the greater precision of the later. Thus the Board 
of Trade since 1898 has included in the tabulation of exports 
ships which, leaving our shores with merchandise, are them- 
selves sold to a foreign owner; and we have the following 
tabulation : — 



1899. 

1898. 

Exports of Home Products 
(exclusive of ships sold to 
foreigners) 

Re-exports of Home and 
Colonial Merchandise 

Total - 

Value of New Ships exported 

New total • 

^*55.465,000 

65,020,000 

^» 33 . 3 S 9 .ooo 

60,655,000 

^320,485,000 

9,195,000 

^329,680,000 

^294,014,000 
Not stated. 

t 


• F or if E a nd Ej be typical of the unbiassed errors at the two dates, 
then VE^+E* is typical of the error in the ratio, which diminishes with 
either E or E l . See Fart II, Chap. IV, formula (66). L 
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Ignorance of slight alterations in the collection and tabula- 
tion of material has been the cause of many statistical mistakes. 

To sum up the chief results of this chapter : there are two 
processes which tend to accuracy — averaging, which diminishes 
unbiassed errors; and comparison, which dimin- 
ishes biassed error. The errors in weights are 
seldom so important as the other errors which are present in 
estimates. Errors in a result cannot, of course, be calculated, 
but can be expressed in terms of errors in the items, from 
which it comes; we cannot attain certainty, but we can 
indicate processes which diminish errors, and with the help of 
mathematics measure the extent of diminution. Initial errors 
are diminished most, when we calculate the ratios of weighted 
averages of similar and similarly estimated quantities. Index- 
numbers, which we discuss in the next chapter, are examples 
of this class. 

The accuracy resulting from the process of sampling 
requires more mathematical treatment, and is dealt with in 
Part II, Chapters II, and IV. 


•O 2* 



CHAPTER IX. 

INDEX-NUMBERS. 

The discussion of index-numbers supplies so good an illus- 
tration of the principles laid down in the last chapter, and 
index-numbers are so important in themselves, that, though it is 
our intention to avoid special questions, it will be worth while 
to devote a chapter to them. 

Index-numbers are used to measure the change in some 
quantity which we cannot observe directly, which we know to 

PtoKtia, o i have a definite influence on many other quantities 
toda-numbm. w hich we can so observe, tending to increase all, 
or diminish all, while this influence is concealed by the action 
of many causes affecting the separate quantities in various ways. 
Thus, to take three of the quantities to which index-numbers 
are applied, the change in the relation of the precious metals 
to the work to be done by them affects prices of all com- 
modities, but very many other causes are at work affecting the 
prices of separate groups of commodities; there are general 
causes tending to raise the wage of a week's work of average 
skill, but this general increase is concealed by numberless mii.Gr 
causes affecting different grades of labour in different degrees ; 
the change in the consumption of goods by the working or other 
classes is a sufficiently definite quantity, but it can only be 
measured indirectly by observing the varying changes in the 
consumption of individual articles. 

The use of index-numbers is not, however, confined to these 
instances, but is nearly co-extensive with the field of statistics ; 
for we have limited the term statistics to the measurement of 
complex groups and their changes ; the object of statistics is to 
measure the action of the general laws which govern a hetero- 
geneous group, and the changes produced by general forces can 
be measured, as a rule, only by their effect in individual cases ; 
thus the method of index-numbers is at once applicable to the 

196 
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disentanglement of that which is common to the whole group 
from those variations which are special to individual items. 

In the more restricted sense a series of index-numbers is a 
seijes of weighted averages, calculated periodically, where the 
quantities averaged are similar (prices or wages), n, tun<* 
and the weights are defined so as to give the to4 “-“ umber *- 
actual average of the whole group concerned in each measure- 
ment. In its less restricted sense a series of index-numbers is a 
series which reflects in its trend and fluctuations the movements of 
some quantity to which it is related. Where the weights and the 
qaantities are both known exactly, the method of index-numbers 
is merely* a convenient way of expressing straightforward 
arithmetical results in a simple manner ; this simplicity can be 
nearly realised in index-numbers of prices of exports. Where 
the quantities are samples selected from a wide group, and there 
is no obvious method of deciding their relative importance, the 
index-numbers have a less direct relation to the movement of a 
definable and measurable phenomenon ; such is the nature of 
most price index-numbers and of some wage index-numbers. 
Where the quantities are not direct measurements of examples 
of tlfe phenomena which it is desired to study, but of allied 
phenomena, then the connection between the series of index- 
numbers and the phenomena is indirect ; such, in fact, are most 
of the index-numbers of wages and of employment.* 

The most ordinary way of forming an index-number is inter- 
mediate between the extremes of exactness and of indirect 
relation. Thus in the Labour Department’s index-number of 
the change of rates of wages, the objective is presumably to find 
mrmbers, year by year, whose ratios are the same as the ratios 
of the average rates of weekly wages of persons in regular 
industrial work in the United Kingdom; at least, the numbers 
»are generally quoted in this sense, and the heading in the 
Abstract of Labour Statistics is “ General Course of Wages in 
the United Kingdom ” ( e.g ., XVIth Abstract, Cd. 7131, p. 82). 
This index-number is obtained by selecting some hundreds of 
• recognised time or piece rates, expressing each as a percentage 
of its amount in 1900, and averaging the results year by year. 
The choice of weights in this average is indirect ; each of the 
five groups (building, coal-mining, engineering, textiles, agricul- 


• Parts of these pages are taken from the Statistical Journal, 1912, pp. 
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tural) is taken as of the same importance, while the building 
group contains 74 items, agriculture 115, etc. Mr. Sauerbeck 
obtains his index-number of prices by selecting the prices of 
typical commodities, and weights them by* the device of dupli- 
cating quotations for the more important. Thus in these and 
other cases we have a selection of " quantities," whether 
deliberately or by accident, and an assignment of “ weights," 
whether directly or indirectly. It is then hoped that ' the 
numbers will move in direct proportion to the phenomenon, 
average wage or average level of prices, whose measurement is 
attempted. 

In such cases three points call for consideration — (1) The 
nature and extent of the group and the nature of its special 
property whose general change is studied. (2) The method of 
choosing samples. (3) The effect of weights. (1) With Mr. 
Sauerbeck's numbers the group is Prices of wholesale com- 
modities in the United Kingdom ; and with other index-numbers 
the groups are the prices of goods exported, of goods imported, 
and so on. In the Labour Department 's wage index the group is 
two-fold and consists of (a) rates of weekly time wages, (6) piece 
rates, and the result is hybrid. It is essential to define both the 
extent of the group and the property or attribute which is to be 
measured. The property is sometimes elusive, as the " purchas- 
ing power of money " or " the amount of unemployment," and in 
such cases we have to define and measure an allied attribute, 
such as the level of prices or the number unemployed according 
to some chosen definition of unemployment. (2) In choosing 
samples, the rule generally followed is to take only those where 
the definition is adequate and the measurement accurate, and 
in the best known index-numbers the choice is then so limited 
that all quotations which satisfy the rule are included. It very 
often happens that in this way the definition of the group must 
be reconsidered and limited. Thus if we start out to measure 
prices in general, the necessities of definition generally limit us 
to wholesale prices of goods which have regular market quota- 
tions ; and in wages the Labour Department is limited to cases 
where wages or rates are agreed on or standardised (except 
in the case of agriculture). In order that the resulting index- 
number should be subject to the analysis of the law of error the 
samples should be. random and independent in their fluctuations 
from the general movement ; dependence increases the number 
of samples necessary for an assigned precision. Randomness 
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may, perhaps, be secured by the accidents which make the 
samples eligible ; this is probably the case with wholesale prices 
but not with wages. Where the selection is biassed, we may 
Sometimes obtain safety by further restriction of the definition 
of the group. (3) If the number of independent quantities is 
at all considerable, any reasonable system of weights is likely 
to give as good a result as the conditions of the problem 
all&w. 

Suppose that the changes in a group of quantities are deter- 
mined by one general force which acts on all in the same sense, 
tfiat is, te$ds to increase all or decrease all, and by several other 
forces each of which acts on one or more of the quantities, and 
some of which tend to increase, others to decrease the quantities 
they affect ; then of the special forces, some wi Y tend to increase, 
others to diminish the average, while the general force will 
have a cumulative effect entirely towards increasing, or entirely 
towards diminishing it. If the separate effects of the special 
forces are small compared with their number, they will tend to 
neutralise one another in their influence on the average ; and 
the fhange in the average will show the influence of the genera] 
cause only. In the language of the last chapter, the special 
forces produce unbiassed changes, which are negligible in their 
effect on an average, in comparison with the biassed changes 
produced by the general force. 

It appears from consideration of many of the index-numbers 
in ordinary use that the quantities actually measured are not 
those whose general movement we wish to know. Wholesale 
prices do not move with retail prices in accordance with any 
simple law, either of constant difference or constant ratio; 
standard wages differ in an unknown way from average wages ; 
piece-rates have a varying and unknown relation to earnings. 
We do not get any such simple relations between the quantity 
that can be measured and the property that is really in question 
as y = x, or y == kx, or y = a + bx ; but rather y — f(x), where 
the form of the function is unknown. In order that the index- 
ntlmber may be intelligible, y = a + bx must be a good approxi- 
mation over the ordinary range of x — for extreme values of * 
terms of higher powers may become important and the resulting 
index untrustworthy. Here a disappears in the process of 
forming an index-number. It is often difficult to determine b, 
which measures the ratio of a change in y to that of a change 
in x . 
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The resulting index-number for any assigned year may be 
defined ahd expressed thus : Let x lf x % . . . x n be n quantities 
whose general movement is to be studied. Let y v y 2 ... y r • 
y n be measured quantities related to the former by equation^ 
of a form to which y r — ioo = b r (x/— ioo) is a good approxi- 
mation.* Let suitable weights w v w 2 . . . be assigned and 

write J for j for ».r, + ».>.+ 
u w x + w t + ... . + w 2 + . . . . 

Then J is the incalculable theoretic index-number whose changes 

express the movement, and I is the index-number calculated 

and used. 

T __ Zw (y — ioo) _ Zwb (x — ioo) 

1 100 ~ 2^ “ Zw • 

Let b x =* k -f d v b 2 = k -f d 2 . . . . , where k is chosen, if 
possible, as that average of the b’s which makes 
X =L (^- i o 2 ) ( = F iSay , 

small for the ordinary range of values of the x's. 

Then 

I - ioo = + ^±{x^-ioo) = k (J _ Ioo) ^ F 


Zw 


If the x's have, in general only, a moderate range of values, if 
the b* s are nearly equal and extreme values of b do not coincide 
with extreme values of w , then F is small and its variation from 
year to year negligible. 

In this case I is so related to J that it equals ioo in the 
standard year when J (and every x and y) is ioo, and a change in 
its value is very nearly k times the change in J, where k is su* 
average of the b* s which measure the fatio of the ch?uiges of 
the various y's to those of the corresponding x's. 

If we try to make a retail price index-number out of whole- 
sale prices, the b's are not known, and presumably differ greatly 
from one commodity to another, from time to time, and vary in 
an unknown way when prices are specially high or low. Hence 
the connection between general retail prices and wholesale prices 
is not so close as to allow the statement that a change in the 
one is directly proportional to a change in the other. In the 
case of the Labour Department's index-number of wages, the 
changes in time-rates have not the same relation to earnings 
as have those in piece-rates, and in neither group is the relation 

• This equation can readily be obtained . by a rearrangement from 
y — a q- bx. 
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known ; that is, the b's are unknown and are not equal. So 
far as piece-rates are concerned, when these rates ar£ rising it 
often happens that earnings rise more rapidly ( b less than i), 
arfd when they fall \hat the earnings fall less rapidly ( b greater 
than i) ; that is, the b's are not constant and F, in the formula 
just given, is unknown and not negligible. 

If the b's are equal, F is zero, and k could be determined by 
special examinations in two years. Then the movements of I 
would reflect faithfully the movements of J on a known scale. 

The actual relations are not known; if an x is 4 per cent, 
above the.average, we do not assume that y is also 4 per cent, 
above its average, but assume that its deviation is 4 per cent. 
X b, where b is nearly constant. The b's differ from some 
(weighted) mean value ( k ), and it is assumed that the effect of 
these differences nearly disappears when the average is taken, 
and that the mean value, k, is nearly constant from year to 
year. Various hypotheses can be made as to the values of 
the b's and the resulting value of k, and the fluctuations of the 
index-numbers interpreted. 

]Jt is essential that when an x returns to a value after a 
fluctuation, the corresponding y shall return to its former value, 
or at least that any differences shall be small and unbiassed. 
This condition would be broken if wholesale prices were used to 
measure the changes in retail prices, while the relation between 
the two gradually changed, as presumably it does. It is broken 
in the Labour Department's index of the general course of 
wages, in so far as changes in standard wages or piece-rates 
have a varying relation to changes in average wages. 

There are many index-numbers of wholesale prices* extant, 
some of which we may pass in review. The Board of Trade 
publish the recorded quantity and value of goods th* Board of 
imported and exported, and the average prices of Trade inde*. 
these goods can be calculated. Those commodities are selected 
which occur in the returns for the whole period chosen. A 
particular year is chosen as base ; then the goods are valued 
irf all other years separately at their prices in the base year ; 
the total of these values in any year is the sum which the 
goods would have been worth if their prices had remained 
unchanged; the ratio of this value to that actually recorded 
is the ratio of their average price in the base year to their 
average price* in the other year selected (if the term average 
is used broadly), and if. the first term of this ratio i« equated 
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to ioo, the second term is the index-number required for the 
year selected, expressed as a percentage of the number for the 
base year. It is at once evident that we are here dealing with 
weighted averages. * c 

Let p x , pt, pi ... be the prices in the base year of units 
of the goods selected, and r x p v r 2 p 2f r 2 p B . . . the prices in 
systems the year for which we require an index-number : 
oi weights, then r v r 2 , r 9 . . . measure the changes of prices 
for the separate commodities, and these r* s are the samples 
from which we are to deduce the general change of priqp. 
The weights used in the process described may. be found 
thus : let b x> b 2 , b 2 . . . be the numbers of units of goods in 
the selected year; then the total value in the selected year 
at the prices of that year is (b l r 1 p l + b 2 r 2 p 2 + ...), and at the 
prices of the base year is {b x p x + b 2 p 2 + . . .) ; the ratio is 
Zbrp : Zbp , and the index-number for the selected year is 

100 x §$ = 100 x K r -Mp)- 

Here the weights applied to the r's are the values which the 
corresponding goods in the selected year would have borne at 
the prices of the base year. It is clear that the selection of the 
standard year affects the weights, for any particular commodity 
can be given special weight by choosing as base a year in which 
its price is high, and much trouble has been spent in searching 
for a “ normal ” year ; but though the weights of separate com- 
modities are affected, it does not follow that the average will 
be altered, and we should expect from the principle laid down 
above that the change would be very slight. In fact we ha ve 
the following figures : — 


INDEX-NUMBERS OF 1886 AND 1883 COMPARED.* 

Imports. 

Exports. 

Weights. | 

Values at 
>?73 
Prices. 

Values at 
1883 
Prices. 

Values at 
1861 
Prices. 

Values at 
1881 
Prices. 

Values at 

^873 

Prices. 

Values at 
1883 
Prices. 

Values at 
1861 
Prices. 

Values 
at 18& 
Prices. 

1883 

IOO 

IOO 

IOO 

IOO 

IOO 

IOO 

IOO 

IOO 

1886 

81.7 

8a. 1 

82.9 

82.3 

88 1 

88 

87 

89 


• From the Economic Journal and the Statistical Journal , both June 
1897. * « 
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It is possible to produce figures which show a variation 
caused by a change of base year, but it is done by choosing 
samples which lend themselves to the special argument. 

• • Since so great an alteration in choice of weights makes so 
little difference, it is worth while to see if we need even keep 
the weight due to the quantities imported (the b's in the above 
formulae). The following table may be quoted * to show that 
these weights even have little influence : — 


Index-Numbers for 1895, when that of 1881 is 100, obtained by 
Various Systems of Weighting. 



» 

Ratios or Pricks (r x , r, . . .) 

Reciprocal 
of A.M. 
of i, -L, 

r i , r ‘ 

Economists 
F igures. 

Weighted 
by Values 
of 1895 
Quantities 
at 1881 
Prices. 

Weighted 
by Dec lared 
Values in 
x88x. 

Arithmetic 

Mean. 

Median. 

Geometric 

Mean. 

Imports 

Exports 

• 

671 

83 

69 

87 

73* 

82 

N *■* 

N 00 

72I 

7 8 i 

69 

75 

}’■ 


Let b lt 6, ... be quantities and p lt p t . . . prices in 1881, 
and let c lt c, ... be quantities and r x p lt r t p t . . . prices in 1895. 
The first column gives the result of — 

Sum of 1895 quantities at 1895 prices 
Sum of 1895 quantities at 1881 prices 

= and the weights applied to the r's are the 1895 

oc x pi 

•quantities valued at the 1881 prices. 

The second column gives the result of — 

Sum of 1881 quantities at 1895 prices 
100 Sum of 1881 quantities at 1881 prices 

= — Trf-j— - 1 , and the weights applied to the Vs are the declared 

values of 1881. 

In the next three columns the arithmetic mean, the median, 
and the geometric mean of the r's are given. In the last column 

but one the arithmetic mean of — , ~ . . . , that is of the ratios 

r \ 


• From the* Economic Journal (with a correction in the statement of 
weights). 
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of the prices of 1881 to 1895, is calculated, and the ratio of this 
mean to coo equals the ratio of 100 to a new index-number, 
which corresponds to the former arithmetic mean with the 
years 1881 and 1895 interchanged. The* figure in the Iasi 
column is calculated from material* given in the Economist ; 
every year the imports and exports are valued at their prices 
in the previous year, and thus an annual ratio is given similar 
to that in the first column of figures in the table just given ; 
the number 100, taken for 1881, is multiplied by this annual 
ratio year by year till 1895, and the number 71 is the result. 
[Algebraically this index-number is : — 

IQ0X S(r.^) x 2(ri.^)x x.] 

A more complete analysis of these figures, and an investiga- 
tion as to the causes of the divergence between the export 
indices 87 and 75, would show which of the methods should be 
adopted. Here we will be content with noticing that the 
unweighted average, 82, is very near the first weighted 
average, 83. 

Further methods of dealing with such weights are given on 
pp. 209-2 1 1, under Retail Index-Numbers. 

The advantage of index-numbers on the Board of Trade 
basis is that they measure approximately an objective quantity. 

objective and a result is obtained which can be stated in 

meum*. terms which appeal to the ordinary man who is 

not a statistician : such as, " The imports of 1895 would 
have cost half as much again if their prices had been those 
of 1881 ; ” but it does not follow that this index is the best 
measure of the less-definable quantity, " Fall in the price of 
imports/' where we imagine a general cause affecting this 
class of commodities whose action is modified by other partial ' 
causes. 

It is important to choose a normal year or the average of a 
choice oi 1 mm period as base, for the choice of year affects the 
effective weights in subsequent comparisons. Using 
the following notation — 

Weights Price In Price in Price in Ratio of Prices in Third 

chosen. Base Year. Second Year. Third Year. and Second Years. 

tt> l IOO lOOf l ioot^ 1 : n 

w t IOO ioofg ioofg 1 R a=r» 1 : r x 
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and writing 100, I lf I, as the index-numbers in the three years, 
we have — # 


Swioo . Swioor m Swioor 1 
~$>w 


$ w 


”5 w 


• T T v ® wrl I V 

S^ -llX 


= 100 : 1, : I, 
S (wr . R) 


Siw- 


Wfyereas if we had taken the prices all as ioo in the second year 
we should have I* = Ii X 

„ If the averages were unweighted we should still have the same 

SrR 

difficulty, tor then the values would be -g— on one system and 

-SR on the other. 
n 

Since error? in weights have under ordinary circumstances but 
little effect, it is only when a quite abnormal base year is chosen, 
or when prices are moving very irregularly, that this consideration 
becomes important. 

Professor Edgeworth has pointed out that the use of the geo- 
metric mean avoids this difficulty in the case of Geometric 
unweighted averages. In the same notation — mean - 


100 : I 1 : I, = 100 : iooVfif* . . * r n : 100 . . . r n x 

I» = Ii x V = i, x V ESTTTT Rn , 

V *1*1 • . . rn 


so that the same result is obtained for the comparison of two years 
whatever year is taken as base.* 


• • Mr. Sauerbeck and the Economist both avoid in part the 
difficulty of weighting the separate ratios by their relative im- 
portance in consumption, by selecting from those other i a d«- 
commodities whose prices are most accurately nu “ b * r *- 
determined more instances of such widely consumed articles 
as wheat than of less important commodities such as linseed. 
Mr. Sauerbeck has, in his annual articles in the Journal of the 
fyoyal Statistical Society, verified the correspondence of the un- 
weighted average of his 45 ratios with the average of the same 
weighted on various principles.! 

While the choice of the special weights to be employed is. 


* On this point, and on others in this chapter, see article Index-Numbers, 
in PaLjrave's Dictumary of Political Economy. 

f See, for example, Statistical Journal , 1900, pp. 97, 98. 
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when the number of ratios taken is at all considerable, quite 
T |M|IIII * unimportant, the choice of the quantities dealt 
iifht choia with, has great effect on the result. Thus import 

otumpiM. gg UreSj relating to raw materials and the prodifee 
of other countries, do not lead to the same index-numbers as 
export figures dealing with the price of our own produce, 
though the tables just given show that they are little affected^ by 
weights ; and neither of these agree closely with Mr. Sauerbeck’s 
or the Economist’s numbers, and these again are not in complete 
agreement. The samples on which these four sets of numbers 
are based are from different groups of commodities, and the 
numbers show that the same forces do not affect these groups 
in the same degree. When we have so multiplied our samples, 
that we can subdivide them without affecting the index-numbers 
deduced, we may expect our results to represent the required 
measurement.* 

If we compare the Economist index-numbers with Sauer- 
beck’s during the period 1860-70, we see that the former show 
Gnat advant«(« a very much greater increase during the cotton 
«i tha median. f am i ne than the latter. An index-number which 
can be greatly disturbed by fluctuations, however violent, in 
only one group of commodities, is clearly wanting in some of 
the chief qualities of a general measure of price levels. A very 
simple means of avoiding this difficulty, and indeed all the 
intricacies of weighting, is to take the median of all the price 
ratios of a particular year as the index-number of that year. 
It is perhaps impossible to show theoretically that any other 
average satisfies the required conditions better than the mediae, 
if a sufficient number of items are included, and there can be 
no doubt that it is practically the easiest to calculate. 

If, on the other hand, paucity of data makes the inclusion of 
weights necessary, and the popular desire for concrete measure- 

ftop,,*! ments makes a fine show of weighting expedient, 
we perhaps cannot do better than to adopt such 
a standard as that proposed by the Committee of the British 
Association, for the construction of an index-number, which 
might be the basis of business transactions involving future 
payments. This standard is as follows : — 

* Mr. Sauerbeck's numbers are to be found in annual articles by him 
in the Statistical Journal ; and a diagram showing theu^ from 1820 is 
published by P. S. King Sc Son. 
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Basis of Index-Number recommended by the Committee appointed by 
the Economic Section of the British Association , 1888. 




Estimated 




• 


Expenditure 



Article*. 


per Annum * 
on each. 
000,000’s 

Weights 

assigned. 

Prices to be taken from 



omitted. 




Whfetf • 


£60 

s 1 


Gazette average, English wheat. 

Barley - 


30 

5 1 

20 

n » barley. 

Oats 


So 


0 » oats. 

Potatoes, rice, &c. 


50 

5 J 


Av. import price, potatoes. 

Meat 


100 

IO \ 


Market quotations, live meat, 
Smithfield. 

0 



l 


Fish 

. 

20 

*1 1 

► 20 

Board of Trade Returns; aver- 





age per cwt. landed. 

Cheese, butter, milk 

60 

71 


Cheese and butter, average im- 




J 


port price. 

Sugar 


30 

2 *1 


Av. import price, refined sugar. 

Tea 


20 

*1 

1 

* • • tea. 

Beer 


100 

9 

V 

» export * beer. 

Spirits * 


40 

2* 

[2° 

» imjxjrt » spirits. 

Wine • 


10 

1 


* » * wine. 

Tobacco - 


10 

*i 


» • » tobacca 

Cotton - 

- 

20 

2 i 


* m » cotton. 

Wool 


30 

2 \ 

1 10 

m m m wool. 

Silk 

- 

20 

24 

m m m raw silk. 

Leather - 


10 

24 J 


* * * hides. 

Coal - • 


100 

10 i 

) 

» export * coal. 

Iron 


50 

5 

j 20 

Market price, Scotch pig-iron. 

Copper • 

Lead, zinc, tin 


25 

25 

24 1 
24 

Av. import price, copper ore. 

* * lead ore. 

Timber - 


30 

3 1 


Average import price. 

Petroleum 


5 

1 


W 0 0 

Indigo - 

• 

5 

1 1 

10 

• 0 0 

Flax and linseed 


10 

3 


0 0 0 

Palm oil 


5 

X 


0 0 0 

C&outchoux 


5 

1 J 


0 0 0 


• • 

American statisticians have adopted a method of comparing 
totals instead of weighted or unweighted price-ratios for the 
^formation of index-numbers. “ By so doing, it is maintained, 
two difficulties are overcome : First, the problem of choosing 
a base year, since actual prices do not necessarily have to be 
reduced to a relative basis, and, second, of deciding on an 
appropriate average of relatives.” * In fact the method, 
"though it may have advantages in intelligibility and simplicity 
of construction, introduces no new principle. It may be thus 
described : — The price of each article in, say, 1914 is multiplied 

• Secrist, An Introduction to Statistical Methods , 1917, pp. 329 and 339, 
340. See the Bulletin of the United States Bureau of Labor Statistics, Whole 
Number 181, October 1915. 
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by the quantity marketed in the last census year, 1909 ; the 
price in/say, 1912 is multiplied by the same quantity. With 
the aggregate for 1914 as the base, or 100, the index-number 
for 1912 is obtained by comparing the 1912 aggregate with i,he 
1914 aggregate. If w v w 2 . . . are the quantities, P x , P 2 . . . 
the prices in 1914 and p v />*... those in 1912, the aggregates 

are SzeP, Swp t and the index is 100 = 100 where 

R v R 2 . . . are the price ratios 1914 to 1912. This is equivalent 
to the Board of Trade index discussed above, and has no special 
claim to accuracy. 

Since we can only obtain rough correspondence in dealing 
with wholesale prices, we cannot expect to be able to measure 
detail price retail prices with any great precision. For we saw 
index. j n the preceding chapter that the error in an aver- 
age bears a definite relation to the errors in the items which 
compose it ; if the errors in the items are on the whole doubled, 
it is likely that the errors in the average and in the ratio of two 
averages will also be doubled, and we shall need four times * as 
many samples to restore the precision. Unfortunately the 
material for computing a retail index-number is even more in- 
complete than that for wholesale prices, and owing to the smaller 
number of articles that can be included, and the preponderance 
of such items as bread and rent, the question of weighting 
becomes of more importance. 

When we wish to construct an index-number to show the 
purchasing power of money of special classes, we must take 
special into account some considerations which can be 
difficulties, ignored when dealing with wholesale price num- 
bers. Different classes of persons at the same time, and the 
same classes at different times, spend their income in varying ^ 
proportions on different objects. If we could collect enough 
sufficiently accurate samples, this fact would not matter so 
much ; but it would still be of some importance owing to the 
tendency to make increased purchases of cheapening com- 
modities. As it is, it would be necessary to construct separate 
index-numbers for each class and each district. The difficulty 
of insufficient and inaccurate data cannot at present be over- 
come ; but as it is possible that we may in the future get definite 
records of retail prices sufficiently numerous to make up for 


• See Part II, Chap. IV. 
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their want of precision, we may glance at the other details of 
the problem. To form an index-number for a particular class 
of people, we need records of the method of expenditure of their 
irfccme at all the dates in question, of sufficient numbers to 
obtain the slight precision Which weighting needs. Then if we 
•had fairly good records of retail prices several methods of 
weighting are open to us,* all of which are likely Methods of 
to give nearly the same result. The necessity of weightin «* 
weighting and the methods are best shown by a numerical 
illustration, f 

^*The data for the measurement of the change of the cost of 
living, however it is defined, are always of the same nature, and 
consist of records of the quantities of various commodities 
bought and the prices paid for them at two dates or places or by 
representatives of different social groups. Thus we have given 
with greater or less accuracy : — 


Commodity. 

• 

Place or Date. 

A. 

B. 

Quantity. 

Price. 

Expeudi- 
• ture. 

Quantity. 

Price. 

Expendi- 

ture. 

I 

Q. 5 

< Pi =* Ei 

<h * 

X Pi 

= e x 

* 

Q. 5 

< P a =* E, 

9t 

P, 

= ». 

3 

Q, > 

< P 3 =* E a 

0* 

< Pi 

= f. 

n 

Q» 3 

< P* = 

= E * 

qn : 

* pn 

= C n 


In the Table on p. 210 are shown in this form the budgets 
used in the Report of the Committee on Cost of Living, 1919. 

The second year's budget at the first year's prices would 
have cost 225-5 d. instead of 455*5^. The index-number of 


retail prices on this basis is 100 X ^5 ^ or 100 = 202-0. 

y 225-5 5qP 

The weight applied to a price ratio p : P is qP. The index- 
number = 100 (a) 


Se} 


The first year's budget at the second year's prices would cost 
521-6 d, instead of 246*5 d. The index-number on this basis is 


* See article on Wages, Nominal and Real, in Palgrave's Dictionary of 
Political Economy , pp. 640 - 41 . 

f Taken with part of the context from " The Measurement of Changes in 
the Cost of Living," Statistical Journal , 1919, pp. 343 seq. 

P* 
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521-6 SQp c /tx 

100 x •E 4 6- 5 or 100 S§P = 211,6 W 

The weight applied to a ratio p : P is QP or E. This is 
the method used in the Labour Gazette to irfeasure the " average 
increase in retail prices.” 


Urban Working-class Budgets . (Based on Cd. 8980, p. 18.) 

Expenditure of Standard Family. . 




June, 1918. 

PI P 

Q 

Quan- 

tity. 

P 

Price. 

E 

Ex- 

pendi- 

ture. 

9 

Quan- 

tity. 

P 

Price. 

$ 

, Ex- 
pendi- 
ture. 

Price 

Ratio. 





d. 

d. 


d . 

d. 


I. 

Bread and flour 

lbs. 

335 

1-51 

50.5 

34-5 

2.36 

81.5 

1.56 

2. 

Meat 


6.8 

8.6 

58.5 

4.4 

18.6 

82.0 

2.15 

3 - 

Bacon 

>i 

1.2 

11.7 

I4.0 

2.55 

26.1 

66.5 

2.24 

4 

Lard, suet, etc. - 

>> 

No. 

1.0 

7 5 

7 5 

.78 

17.9 

I4.0 

2.29 

5 - 

Eggst - 

13 

1.0 

13.0 

9.1 

4.0 

365 

4.00 

6. 

New milk - 

pints 

9.2 

1.8 

16.5 

11 .7 

3 -o 

35-5 

I.69 

7 - 

Condensed milk 

tins 

■25 

6.0 

1.5 

•59 

14-5 

8-5 

I.42 

8. 

Cheese 

lbs. 

.84 

8.9 

7-5 

• 4 i 

20.7 

8.5 

2.32 

9 . 

Butter 


1.70 

* 4-4 

24-5 

•79 

29.7 

23-5 

2.07 

10. 

Margarine 


.42 

6.0 

2-5 

.91 

12. 1 

11. 0 

'2.01 

11. 

Potatoes - 


156 

•7 

11. 0 

20 

125 

25.0 

I.78 

T2. 

Rice and tapioca 


14 

3-2 

4-5 

1-3 

5-8 

7-5 

1.82 

13 - 

Oatmeal - 

>) 

i -3 

. 1-9 

2-5 

1-4 

4-3 

6.0 

2.24 

M- 

Tea - 


.68 

21.3 

M -5 

•57 

33-3 

19.0 

1.56 

15 - 

Coffee 

»> 

.09 

16.7 

i -5 

.12 

25.0 

3-0 

1.50 

16. 

Cocoa 

n 

.18 

19-4 

3 5 

23 

32.6 

75 

I.69 

l 7‘ 

Sugar 

>» 

5-9 

2.2 

13.0 

2.83 

7.07 

20.0 

3-21 


Total 

. 





246.5 



__ 1 

455-5 




Other food 

- 

i 

— 

52.5 

— 


hi . 5 

— 


Total 

- 

1 

— 

299.0 

— 

1 

567.0 

1 


S. QP = 246.5 s. qp ^ 455.5 3. Q£ = 52i.6 S. ^=225.5 

S. e ~- S. E — 1.90 S. Qp + S - QP =2.12 S. qp - 4-S. ^P=2.o2 


In some cases there may be reasons for preferring (a) or 
preferring (b). If not, it is reasonable to take a mean between 
the results ; the arithmetic mean is 206-8, the geometric mean 
is 206-74, the harmonic mean 206*69, and it is usually indifferent 
which we take. Or a method which may be commended for 
its simplicit y in ide a i s to tak e the averages of the quantities 
seriatim (£ Qi + q v J Q 2 + q 2 • • •) and find their cost in each 

year and compare their sums. This gives f (9 7 - ^ X 100 

= 2037. The weight applied to a ratio is now {Q + ?)P (c) 
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» 


Another method is to take the average of the expenditures 
at the two dates on each item as the weight for the price ratio 


oi ^hat item, so obtaining ~ 1 9^‘^ • 

yl-k i e ) 


(d) 


This, however, involves the quantity /> 2 /P in the numerator 
# and gives undue weight to exceptional movements of prices 
of particular commodities. 

fn absence of knowledge of quantities the simple average 

x 

of the price ratios -Sp x ioo = 209* r ( e ) 

is sometimes taken ; but it is never safe to neglect weights in 
this problem, though it is not necessary to aim at great precision 
in them. 

Finally, a more complicated method has been advocated in 

which it is supposed that the second total is expended in the 

same proportion item by item as the first, and the quantities 

of each item thus purchasable are valued at the price in the 

first year. The ratio of the whole actual expenditure in the 

second year (x 100) to the expenditure so calculated 

100 Se SE , 

= 196-4 ..,(/) 


s {e 


Se 

1 SE 


X J 1 + 

Pi 


•} 


— IOO 


SE; 


The weight applied here to the ratio p is QP a ~ p, and as in 


case (d) gives undue weight to particular prices. Also there is 
no reason to suppose that the expenditure is kept in a constant 
ratio item by item. 

No agreement has been reached on the question which 
method is the best for the measurement of retail prices ; but 
there are* serious theoretical objections to (d) ( e ) (/). There is 
nothing in general to choose between ( a ) and ( b ), but for this 
purpose one year has the same claim to be included as the 
other and we are therefore obliged to take a mean. Of the 
various means the method (c) of averaging the quantities is the 
most sensitive, is quite easy to compute, and on all grounds is to 
* be # recommended.* 

The problem of measuring the movement of retail prices 
has been generally confused with that of measuring the change 


• This opinion is different from that expressed in former editions. For 
further information see the bibliography in the article on Workmen's Budgets 
in Palgrave’s Dictionary of Political Economy. 

f 2* 



212 , ELEMENTS OF STATISTICS 

in cost of a standard (representing either minimum subsistence 
or efficiency subsistence) with the items the same at both dates. 
It is not proposed here to discuss such a measurement in detail, 
but it should be realised that there is a continual change'-'in 
the prices and supply of the various commodities. For such 
budgets it ought to be assumed that the same nourishment 
(or more generally the same satisfaction) is obtained at each 
date by the most economical purchases, so that the quantities 
of those foods whose price has risen least or fallen most are 
increased while others are diminished, and consequently an 
upward movement is less and a downward movement greater 
than that measured by method (a).* 

There are still two further considerations which hinder 
the complete solution of the problem. In all budgets rent is 
Further an important item, and there seems no prospect 

difficulties. G f obtaining any good estimate of the relation 

between increasing rent and improving accommodation, allow- 
ing for the benefits of public expenditure paid by rates included 
in rent. Again, if we consider, not how money is spent, but 
how it might be spent, we should have to introduce a jnore 
general factor ; for the margin which remains when necessities 
are satisfied may have a rapidly growing purchasing power, 
as the products of machinery' increase in variety and diminish 
in price ; perhaps the calculated fall in wholesale prices forms a 
fair measure of this growth. 

Leaving this very difficult problem, let us return for a 
moment to the measurement of a quantity more typical of 
index-numbers. f If we have to measure the action of a cause, 

i ai which affects quantities which have no common 

of consumption, mea sure, we are still able to apply index-numbers. 
A general increase has taken place in the consumption of 
imported goods, and if we can measure this increase indepen- 
dently of any change in price, we can use it for criticism of any 
measurement of a movement in real wages. The only common 
measure of bread, currants, cheese, meat, etc., of practical value 
is their price, their weight being useless for the purpose; 
consequently another method is necessary. If the quantities 


• For the discussion of these questions see “ Cost of Living," Statistical 
Journal, May 1919. 

t The following illustration is based on Mr. G. H. Wood's paper on 
“ Some Statistics of Working Class Progress," Statistical Jouvnal, 1899. 
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consumed year by year of a number of such commodities are 
written down, expressed as percentages of the consumption 
in any years (not necessarily the same), we have series of 
ifumbers which oidf need weighting to form the index-number 
required. We can in this* case verify, that any logical choice 
of weights, based on their value or their assumed importance, 
or even a random system of weights, gives much the same 
indfex-number as the simple arithmetic averages; in fact, we 
have a sufficiently good group of samples to render us nearly 
independent of weights. When this is the case we can say with 
safety tha{ the number required lies in the neighbourhood of the 
group given by the various systems of weights, and choose what 
appears the most logical system for the estimate we adopt. In 
the paper referred to, five different systems applied to only 
fourteen commodities give results for the increase of consump- 
tion all between 13*8 and 20-1 per cent, in the period 1873-96. 

The application of index-numbers to wage statistics does not 
involve any fresh principles. It is not permissible to ignore 
the change of weights in this case; for otherwise w»e*ind«- 
we ghould not allow for the general tendency to nurab#r *- 
increase numbers where wages are rising. There is great 
liability to ‘f biassed ” errors in separate averages; for wages 
for overtime, specially high piece-wages, wages of large uncom- 
bined classes of low-skilled or badly paid workpeople, may often 
be omitted in wage records. These biassed errors, however, 
tend to disappear in comparison ; and it may prove possible 
to construct a wage index-number of very fair precision. * 

. Note added in 1936.— Write I, S(QP) - S(Q p), I t S(?P) = S (qp), 
in the notation of p. 210, and JiS(QP) = S(qP) for a corresponding 
index of quantity. 

Then 

SQP(J - J.X£ - !.) - S(«*> - hS(QP) - I,S(?P) + I,J,S(QP) 

- I,S(?P) - I,J,S(QP) - I,S( 9 P) + I 1 J,S(QP) 
= (I, - I,)S(yP) 

,\ when an increase of price in a commodity greater than that measured 
• by Ij goes with a fall in quantity consumed, as compared with J t , I. is 
less than I 1# Part of the rise of prices may be expected to be eVaaed 
in this way, e.g. t on p, 210, I 2 =*2 02, I, « 2*12. [Cf. International 
Comparisons of Cost of Living , I.L.O., Geneva, 1934, pp. 15-16.] 

* For a complete illustration of method and of the various factors 
involved, see “The Statistics of Wages in the United Kingdom. Part XIV. : 
Engineering and^hipbuilding/* Statistical Journal , March 1906, pp. 154 seq. t 
especially pp. 166, 168 and 185. 
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Section i.— General. 

It is very often the case in practical statistics 1'hat we are 
not able to make serial estimates as frequent or descriptions 
Necessity of of groups as detailed, as is necessary for their use 
interpolation. f ur ther investigations. Thus the population is 
only counted once in ten years ; but we need to bring monthly 
and annual accounts — births, deaths, trade returns, etc. — into 
close relation to the existing number of people, and estimates 
for the budget and the yield of taxes must be based on the 
assumed number of taxpayers for the current year; it is 
therefore necessary to interpolate estimates for the number of 
the people in intercensal years. Again, interpolation is needed 
for the statement of the distribution of the population accord- 
ing to age, a tabulation which is necessary for actuarial work 
and for sociological purposes. The ages returned on the 
householder’s schedule are nominally correct to the year, but 
in practice they are known to be inaccurate, tending to group 
themselves in the neighbourhood of round numbers ; but the 
returns for such age periods at 35-45 years are more correct, 
since the persons who return themselves as 40 years old are 
probably within 5 years of that age. The original returns 
are so erroneous that prior to 1911 they were not published, 
but the numbers were only given in the ten-yearly periods; 
from the numbers so given, it is necessary to estimate the 
numbers for the individual years. Again, the compilers of the 
wage census of 1886-91 enumerate the numbers earning wages 
“ of 15s. and under 20s. ,” “ of 20 s. and under 25s./’ and so 
on, but not the numbers in shilling limits. In problems 
relating to wages we often need more detail ; and when we are 
comparing these wages with a similar group in France, we 
must devise a scheme by which grades of 2 francs can be com- 
pared with grades of 5$., by a suitable system of interpolation. 

2x4 
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Such a necessity is very common when we wish to compare 
groups, which are similar but tabulated on diverse ’systems. 
Thus, two countries conduct their census at different dates. 
Ih'one country the ‘age groups are of fifteen years, in another 
of ten; in one, “young persons” are those under 21 ; 
in another, those under 18. Occasional estimates seldom 
correspond in date ; wage statistics are found for 1840, 1850, 
and’ 1892 in France, and for 1866, 1885, 1886, 1891, and 1906 
in England. Similar differences are found when we are com- 
paring county with county ; and a discussion of the method of 
determining averages in such a case will illustrate some of the 
elementary problems of interpolation. 

Suppose that the figures printed in Roman type in the 
following table are accurate returns of the weekly Elementary 
wages in three districts, and that we wish to find “«npi<=- 
the average change in the three together. 



It is clear that there is something to be learnt about the 
general course of wages from the data, but the lessons are not 
obvious.. The following figures, printed in the table in italics, 
are those which naturally suggest themselves. There is no 
sign, in A of any change between 1862 and 1866, so we write 
15s. for 1864. Judging from B, the figure for 1870 is not 
likely to have been lower than that for 1864, so we write /ys. 
for A in 1870. A is now complete ; we notice that in A the 
« ficst'rise was complete by 1862, and assuming the same in B, 
we obtain ips. for 1862. In C there is a rise between 1864 
and 1866, while in A there is no change from 1866 to 1870 ; 
B will correspond if we write 20s. in 1866. If we write for B, 
ips. 6d. in 1871, 2zs. in 1875, and 20s. 6d. in 1880, we shall 
have close correspondence with A from 1866 to 1881. Similar 
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reasons lead to the numbers interpolated for C. The un- 
weighted average can then be calculated year by year, which 
could not be done directly from the date. This average 
reflects all the changes in the original figures and gives no special 
predominance to any. It may be regarded as the most probable 
series that can be based on the given information. 

We will now notice the assumptions tacitly made in pro- 
ceeding by this method. First, it has been assumed that 
iM umt ti oa, there are no sudden jumps, that such a figure as 
m * 4e - 20s. for A 1864 is inadmissible ; this is only justifi- 

able if we are acquainted with the general causes which influence 
the rate of wages, and know that there was no violent disturb- 
ance in the intermediate dates. We could not make this 
assumption as to wages in the cotton trade in the time of the 
American Civil Wars, nor can we make it over a long series 
of years. Secondly, it has been assumed that in the absence 
of evidence to the contrary the rise or fall has been uniform. 
Thus, in B 1878-81, the wage in 1880 is assumed to be inter- 
mediate between 1878 and 1881 ; if there had been no indica- 
tion from A that it was half-way between in point of w/cges, 
it might have been said that in point of time it was two-thirds 
of the way, and 20s. 8 d. should be interpolated for 1879 and 
20s. 4 d. for 1880, if it was worth while to depart from round 
numbers. Thirdly, it has been assumed that the course of 
wages in the three districts was similar. Thus in A there is 
a rise from 1860-62, but there is no further improvement at 
any rate before 1866 ; it is consequently assumed that the rise 
registered in B and C before 1864 actually took place before 
1862. Again, when considering the period 1870-75, we notice 
that in A there is a fall till 1871, and a sharp rise to 1875, and 
no change to 1878 ; in B, therefore, it is assumed that the wage 
of 1875 is equal to that of 1878, and the fall in 1878 may be 
allowed because it increases the sharpness of the rise in 1871-75. 
In C it is doubtful whether the 12 $. in 1871 should not rather 
be ns. 6 d. The reasons against are that a gain on a low wage 
is often not so easily lost as a gain on a high one ; 6 d. is a 
larger drop proportionately on 12s. than on 15s. ; that the rise 
of 3s. 6 d. which would then be shown 1871-75 is a larger 
proportionate rise than in either A or B ; and that the exist- 
ence of the fall in 1870-71 depends only on the evidence of a 
fall between 1866-71. When the figures are f£w in number. 
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it is necessary to examine them in this way to pick out the 
most probable ; and it is often fairly easy to fill in tlfe figures 
which satisfy all the existing evidence fairly closely. 

**The question at\>nce arises, What certainty have we that 
these quantities, by hypothesis unknown, are in reality any- 
' where near the figures which on the face are most probable ? 

In some cases of interpolation, dealt with presently, the 
answer can be given as a statement of mathematical proba- 
bility, such as : it is 2 to I against a divergence 
of 6 d, from the assigned figure, 30 to 1 against 
one of is>. # 1000 to 1 against one of 2 s. 6 d. t and so on; but 
in the figures most often cropping up in investigations it is 
not possible to assign such a precise probability. There is 
one rough but useful way of testing the accuracy of such 
interpolation as in the case before us which can be explained 
by an example. Test how far we can throw out our calculated 
average for 1870, without violently infringing the common- 
sense of the question. Make A and C as large as possible in 
these dates ; we may perhaps suppose a rise of is. above 1866, 
seeiyg that there is one in B between 1864 and 1870. We can 
hardly suppose either that 1870 is as high as 1875-78, or that 
there is a great drop of as much as 2s. in the single year, if we 
are acquainted with the causes that determine the wages at 
those dates. Let the highest wage we can assign to A and C 
be 16s. 6 d. and 13s. 6 d. respectively. Our average is then 
16s. 8d. instead of 15s. 8 d. Similarly, we might perhaps think 
that 14s. and 11s. were the lowest possible in A and C in 1870 ; 
Jthen the average would be 15s. Assuming that we know enough 
about the general trend of events at these dates to assign limits 
in this way, we can say it appears improbable that the average 
wage in 1870 was less than 155. or more than 165. 8d., and that 
the evidence points to 15s. 8 d. 

The accuracy of our interpolation then depends — (1) On 
knowledge of the possible fluctuations of the figures, to be 
obtained by a general inspection of the fluctuations at dates 
f6r which they are given ; (2) on knowledge of the course of 
the events with which the figures are connected. 

A second example of a similar kind * may be Numerical 
given to illustrate the numerical calculation. ex * mplt - 

* Taken from " Agricultural Wages in England,” in the Statistical Journal 
December 1898* by the present author. 
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Northern Countie*. 


Weekly Agricultural Wages hi 




1867-69. 

1869-70. 




1. 

x. d. 


Cheshire • 


- '3 * 

13 6 


Lancashire • 


• 15 0 • 

15 0 

< * 

West Riding of Yorkshire 


- 14 6 

16 5 


East „ 


• 14 6 

14 11 


North f, ,, 


• 14 6 

15 4 


Durham - 


- 16 6 

16 0 


Northumberland - 


- 16 6 

16 7 

n 

Cumberland 


- 14 4 

14 9 


Westmoreland 


• *5 7 

16 1 


Roman figures given. 

Italic figures interpolated. 

0 


The averages of the wages in the five districts'* for which 
data exist in both periods are 15s. 4-8^. in 1867-69 and 15s. 
10*4^. in 1869-70, that is in the ratio 33 : 34. If we assume that 
the wages in the other counties have been influenced by similar 
causes and increased in the same ratio, we obtain the figures 
interpolated in the table. The unweighted averages for the 
northern counties are now 14s. nd. and 15s. 5 d. in the two 
periods, instead of 15 s. 3 d. and 15s. 5 d. t the averages of the 
given numbers. For general comparison all over England 
between these two years we should have been obliged to 
neglect the missing counties in both years, which would have 
unfairly lowered the general average, since these counties have 
in recent times had wnges above the English average though 
below that of the northern district. At the same time we 
should have unfairly raised the apparent average of the northern 
district. We should also have lost the probable figures for the 
special counties at the earlier date which are on a fairly safe 
basis ; for the wages in these counties of the Northern District 
remain in nearly the same order through the last fifty years. 
At the same time it is easily seen that these wages are not so 
accurately known as those not interpolated, and it is well to 
notice in arguments based on such figures, to what extent the 
interpolated figures are involved. 

A process very similar to that just employed is used in 
giving marks at school to students who are absent from- a 
lesson; attention is paid both to the particular student's 
general place in the class order, and to the average value of the 
marks obtained by the rest of the class in the lesson missed. 

Though the method be fairly complete it is very important 
to notice that interpolated figures rest on quite acdifferent class 
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of evidence to those which are the result of direct evidence. 
In some cases they may represent quantities # 
which have no existence (as in the case of school 
ihStrks) and which are only used for convenience of 
calculation. In others they are simply figures 
adopted as those which in default of definite knowledge appear 
most probable. They must always be clearly indicated as inter- 
polations ; it is always well to state the method by which they 
are obtained, and any subsidiary information which may be 
regarded as direct evidence of their accuracy, and if practicable 
they may # be given not as exact, but as lying between certain 
limits; thus the interpolated figures for Cheshire might be 
written 12s. 6d. to 13s. 6d., instead of 13s . id. 

Several different cases are met with in interpolation, some 
of which are treated algebraically in the next section, while 
others can be illustrated at once by numerical examples. 

The Graphic Method. — If we know the values of quan- 
tities at isolated positions, such as the numbers of the popula- 
tion at the ages 25 to 35, 35 to 45, etc. ; the Graphic 

population in 1871, 1881, 1891, etc. ; wages in method - 

i860, 1866, 1870, etc. ; the numbers whose wages are from 
15s. to 20s., 20s. to 25s., etc., we may represent the facts by 
such a diagram as — 



Years i860 1865 1870 1875 1880 1885 


Suppose that we need the value of the quantity in 1875. 
If we were only given the two points c and d, the simplest 
hypothesis, and the one to be made in the absence of any 
evidence to the contrary, is that the quantity increased 
uniformly between c and d; representing such an increase 
by the straight line c D, the height of the point * will represent 
the quantity in 1875. 
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If the point E is also given, the hypothesis represented by 
the straight lines cd.de will not stand, for it assumes a 
sudden break in the regularity at the point D in 1877, for which 
there is no evidence. We must take ihto account all ifie 
points given, and through them all a line must be drawn whose 
curvature is as smooth as possible, for in the absence of evidence' 
to the contrary, sudden changes in the quantities may be 
assumed not to exist. Such a curve can be constructed on 
mathematical principles, or may be drawn freehand; if the 
latter, it will often be quite as near the facts as the arguments 
will allow us to go. 

This method only applies to continuous quantities, such as 
numbers at different ages, population at different dates, earners 
at different wages in a very large group of wages. Thus for all 
England the average wage must change gradually, but the 
wage of the London builders changed suddenly as the result of 
strikes and arrangements at certain dates. In this case we 
must draw the figure to correspond as closely as possible to 
the evidence, such as — 




C 


n 

£ 

| B 

> 

i - ■ — r— 





A 

LIN 

B OF \ 

EAR S 




where a b represents a sudden rise ; b c a gradually accelerated 
increase due to improving trade, c D a slow falling off from 
the wage reached at c, and D E a determined and successful 
effort to recover the lost ground. 

Periodic Figures. — I f we know the annual averages of 
figures which have a yearly period and a sufficient number of 
monthly averages to estimate the periodic fluctuations by the 
method described on pp. 160 seq., we can interpolate figures for 
any month for which the returns are incomplete with fair 
accuracy. Thus if we are dealing with the numbers of 
unemployed as given in the Labour Gazette, we find a periodicity 
which is not very strongly marked in all the months, but there 
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is in general a fall in the spring and a rise in the late autumn, 
and June is generally the minimum month. We dan then 
make use of the small diagrams on pp. 1 65-6, and, having marked 
iif %11 the information we have, draw the waves on the rising, 
stationary, or descending line of averages, so that the fluctu- 
ating lines shall pass through all the given points. We can 
obtain an idea of the accuracy of the resulting figures by notic- 
ing the general characteristics of the given figures ; we find 
that the percentage unemployed has never changed more than 
two units in one month, that there are no fluctuations which 
have lasted less than three or four months, and that the 
percentages have never been below 1 or above 10. Finally, 
we can look at the trade history of particular dates, and in 
the light we thus obtain reject any improbable figures. 

Use of Subsidiary Curves. — If we are able, by the 
methods described in Chapter VII, p. 158 or p. 174, to 
find a close connection between two series, we can use the 
more complete of them to assist the interpolation of any missing 
figures in the other. We must first investigate carefully the 
closeness and nature of correspondence at the dates for which 
we have complete figures in both series. Then we can draw 
diagrams, similar to those facing p. 155, one of the lines being 
incomplete. Then completing the broken line, so as to bring 
it into as close resemblance with the completed line as the 
given points allow, we shall obtain the most probable values 
for the missing figures. The accuracy of the result can be 
tested as in the previous case. This method may reasonably 
be used in interpolating figures for the yield from one source of 
revenue by means of the yield from another ; for the value of 
exports from that of imports ; for the marriage rate from foreign 
trade; for the wages in one district from those in another; 
for the number of unemployed from the changes in consumption 
of foods ; for changes in parts of the population, when we know 
the changes in the whole, and for many other series. 

• • 

Section 2. — Algebraic Treatment. 

The problem of interpolation to which most attention has 
been given may be stated as follows : — When one quantity is 
subject to continuous regular change, and a second quantity 
changes in coflnection with it, and we know or can estimate 
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directly only some discontinuous values of this second quantity, 
it is required to find the probable values of the second quantity 
which correspond to given values of the first : for instance, 
given the expectation of life at the ages *5, 20, 25, etc., if? is 
required to find it for intermediate! ages; given the popula- 
tion of the country in 1871, 1881, 1891, 1901, find it at inter- 
mediate dates. The only permissible assumptions are that 
the quantity changes continuously, that is with no break 
at any figure, and that the rate of change of the quantity is also 
continuous, that is that the line representing its value is not 
angular, but smooth. The problem can only be attacked 
systematically by the use of the algebraic method of finite 
differences, and it is necessary to begin with definitions of 
notation and to obtain certain fundamental formulae. 

1. Let y be a continuous function ofx, and let y 0 , y v y 2 . . . 
be the values of y when * = x 0f x v x 2 ... . 

Arrange a table thus — 


Values 

Values 

First 

Second 

Third 

of X. 

oi y. 

Differences. 

Differences. 

Differences. 

*0 

y. 

A, 1 


«■ 

*1 

y% 

^i 1 

A , 2 

A,* 

*% 

y% 

A, 1 

Ar 1 

A 1 3 

*» 

y% 

A , 1 

Aj* 

A a 3 

*4 

y > 


A 2 * 


I 

J 

! 


: 


Here each A is obtained by subtracting the entry just higher 
than it in the previous column from that just lower than it; e.g. t 
Ao 1 »yi - yo, Aj 1 = y 2 — y v . . . A0 2 = Aj 1 — A* 1 , . . . A^A^ 
— A * 2 . . . The table may be supposed to continue indefinitely 
downwards and to the right. 

We have at once — 


A. 8 =A 1 1 -A„ 1 = (y a - y,) - (y x -y.) = y a - 2y 1 +y. 

Ai 2 =y l+< — 2 y 1 + i+yi, where t is any integer, 

***== (y»-2ya+y i) - (y 2 - 2 yi+y») =y 3 -3y 2 +3yi-y* 
A»*=y»+<-3y»*‘+3yi+«-y< 

and generally, by an induction similar to that commonly used 
in the proof of the Binomial Theorem and involving the same 
coefficients — 


A 0 rw*y f —f 


x r(r— 1) 
y^+ 1.2 Vr 


r ( r - r )( y - 2 ) 

1. -2. V >-+••• 


(«) 


to r+i terms 
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and Ai'—yr+f — r . y r +t-\-{- f ^ 2 X Vr+**+ . . . to r+i terps, (/?) 

where r is any integer. 

^JVe have also — 

yi =yo+Ao\and y 2 = y 1 +A 1 1 =^(y < ,+Ao 1 )+ (A^+Ao 2 ) =yo+2A 0 1 -f Ao 2 , 
and similarly Ai^A^+Ao 2 , 

and A 2 1 =A 1 1 +A 1 2 =(Aa 1 +A < , 2 )+(Ao 2 +Ao 3 ) = A 0 l +2Ao 2 +Ao 3 . 

• /. ys == > , 2+ A 2 1== y<»+3^o l +3 A o 2 +A<, 2 l 

and similarly A 3 1 == A 0 1 +3A# 2 +3A 0 3 +A 0 4 . 

Continuing this process we again have the Binomial Coefficients, 
so tfiat — 

yr=yo+r . A 0 1 + f ^ — ^A 0 2 + . . . to r+-i terms . . . . (y) 

Ar*— Ao , -j~r . A* <+1 + r ^ ^A 0 tv2 + . . • to r-f-i terms . . (8) 
and starting further down the scale — 

y f+< = y#+r . A, J + r ^A, 2 + . . . to r+i terms, . . . (c) 

where s is any integer. 

Fgr example, let y=* 4 , and let the values of x be o, h, 2 h, 3 h . . . 


Values of x. Values of y. 


0 

0 

A 

A 4 

2 A 

16A 4 

3 * 

81A 4 

4A 

256A 4 

5 * 

625A 4 

6A 

1296A 4 


First. 

A 4 

I5A 4 

65A* 

i 75 *‘ 

369A 4 
67 1 A* 


Difference*. 


Second. 

Third. 

J4A 4 

36A 4 

50A 4 

60A 4 

noA 4 

^r 

00 

194A 4 

108A 4 

302A 4 



Fourth. Fifth. 


2 4 A 4 

O 

24A 4 

o 

24A 4 


Formula (a) gives A 0 4 = (256 - 4 X 81 + 6 X 16 — 4 x 1 + o)A 4 

= 24A 4 , where r is taken as 4. 
Formula (y3) gives A 2 C = (y A — 5 x 6 4 + 10 x 5 4 — 10 x 4 4 + 

5 X 3 4 — 2 A )h A = o, where r = 5, / = 2. 
• -Formula (y) gives (5A) 4 = (0 + 5 + 10 X 14 + 10 x 36 + 5 X 24 

+ o)/* 4 = 625/f 4 , where r = 5. 
Formula (8) gives A 2 8 = (36 + 2 X 24 + o)A 4 = 84A 4 , 

where r «= 2, / = a. 

and — 

Formula (*) gives (5/i) 4 = (16 + 3 X 65 + 3 X 110 + 84) A 4 

= 62 $h A , where r = 3, s — 2. 
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2 . If the relation between y and x is of the form 

• y =: d Q -j- UfX* “f“ • • • *4“ &n% n * 

and the values of * are in Arithmetic Progression, viz. x 0 , x 0 + h, 

......... n - ■■■ — - £ j<"» 

• • • *o + ( n — i)A, then it can be shown that A 0 n = #» • h n \n, 
and that there are no higher differences. 

For A o 1 =*a 0 —ao+a l (xo+h—xo)+ . . . +an{{x 0 -\-h) n --xo n } 
xszha x + . . . +a n {nhxo n ~ 1 + lower powers of x 0 \, 

A 1 1 =/u* 1 + . . . +^n{n^(^o+A) n “ 1 +lower powers of x 0 +h} 
Ao 2 »=2/i 2 a 2 + . . . +an{n(n— i)A 2 x<> n - 2 +lower powers of Xo}. 

Thus A o l , A * 2 contain no higher powers than tfo "-, 1 and x 0 n ~ l 
respectively. 

Continuing this process — 

A 0 n =ann(n—i) ... 3 . 2. ih n = a n h n n \ (£) 

and A a n+1 and higher differences disappear. 

In the example above where — 

y*®* 4 , fln= 1 , n = 4 , A 0 4 =i . A 4 . 4 ! = 24 ^, and A # 5 = o. 

Conversely if we assume that there is no difference above 
the n th , it is shown in the following note that the equation 
between y and x is of the form y == a 0 -f a x x + . . . -f a tl x*. 


Note . — The relation between Differences and Derived Functions (or Differ- 
ential Coefficients) is very important in the theory of the former, and can be 
exhibited concisely by the method of operators. 

Using the usual notation of the calculus, we have by Taylor's Theorem — 

t(x+h)=f(x)+ht'(x)+± j h % t"(x) -f . . . =e hD . /(*), where D stands for the 
operation of differentiation, and e* D is to be expanded into 1 -f AD-}- * ^T) 2 -j- . . . 


and then applied term by term to /(xh The 
justified because of the relationships D{D/(;* 
«D (/(*) )=D(af(x)), etc. 

Now A f{x) =/(* + h) - / (*) = (** D -1 )/(*). 
A{af(x)\ = aAf(x), A{A/(*)} = A l f(x), A *{t 


The use of D as an algebraic symbol is 

{D/W}»D«/W > D-{D-/W}=D-+»/W; 


A{af(x))=aAf(x), A{A f(x) 
used as an algebraic symbol. 
Hence a=.e< 5 * d — 1 


*), A{A/(*)} = A a /(*), A m [A n /(x)) =A m +"/(*), and A can be 


A*={«* D — I )"=(AD+j-|A*D , + . v )-=h»D»(i+JADf JA S D* + . . .)“ 


-A"D"(i+?AD+”-^”^A*D*+ . 

and AD=log (i+a) 

A«D»=={l0g (I+A)}*=(A-1A‘ + JA*- . 

-a«(i--a+« ( ^ ± ^a*+ • • •) 

2 24 


<ii> 


Now if f{x)=a 0 + a x x+ . . . +a n x » D"/(*) =a* . w !, and D*+ 7 (*)=o 

~D"+VM • • • 

/. a n f(x) —h*a n n l,and A"+ 7 (*) =A" + 1 D»+ 1 (i + . . .)f(x)~o from equation (i) 
as in the text. 

Conversely if a*+ 7U) =o=A"+ l /(*)= ...» then frcm (ii) D«+7(*) 



INTERPOLATION 


225 

— o, £>"/(*) — const — Cm, D n - l f(x)=*c 1 #+c n _ u D"-*/(*)=* lv , + c«. l 4r + c„-„ 

/(*)== ^^4. . . • +c 1 *+c # . * 

Hence if the ft 1 * difference is constant, the function is rational, integral, 
and of the n 4 ** degree. 

^Hewton's interpolation formula, discussed below (# ), can quickly be obtained 
by the use of operators ; thus — ■ 

y=f(xo+k)=e kD f(x 0 ) = ( 1 -f A )lf(x 0 ), since * AD = i +A, 

=/(**) +*a/(* 0 )+£ • . **/(*•)+ • • • 

=y*+^ j—Ao 1 -!"^^. ...» where *=»*,+*. 

When the n th difference (or the n ** derived function) is 
zero, formula /? shows that 

yn+i - ny n -i+t + ■ Yx 2 Vn - 2+< • • • ± Vi = 0 . . . fo) 

for all values of t . 

3 . The common formula of interpolation depends on the 
assumption that a continuous function, y = /(*), can represent 
the observations in the neighbourhood of the positions for which 
values are to be found. 

It is assumed that the function can be expanded in powers 
of xy as is generally the case with continuous functions,* we 
may write — 

yarOo-f-a^+a^*-}- . . . +dnX ", (0) 

where n, the index of the highest power of x, is still to be decided. 
By proper choice of a 0 , a x ... a» this equation can be satisfied 
by any (n+i) pairs of values of ( x and y). Thus for the straight 
line y^Oo+ciyX, two points (or pairs of values) can be chosen, for 
the parabola y—ao+a^x-^-a^x 2 three points, and so on. 

The simplest form is y = a 0 + a x x , and the use of this 
assumes that interpolation by proportional parts (the method 
generally employed in using logarithmic, trigonometric and 
other mathematical tables) is sufficiently accurate. In this 
case the first difference and the first derived function (or 
gradient) are constant. 

The parabola takes account of three values, and its use 
•assumes a uniform change of gradient, the second difference 
and the second derived function being constant. 

The introduction of further terms allows for variation of 

* More exactly for functions which are continuous, and whose derived 
functions of all orders are continuous, and not infinite, at the values of x in 
question. • 

0* 
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higher differences, and the closing the expansion at the 
term corresponds to constancy of the n ih difference. 

If the problem is to interpolate in a known mathematical 
function we can test how far the neglect of the variation- In 
the n* difference can affect the 'calculation. Thus in the 
7-figure logarithm table we have — 


Number. Logarithm. 



Difference*. 





First. 

Second. 

Third. 

Fourm. 

Fifth. 

•0 

21 

22 

*3 

*5 

26 

27 

1. 3010300 

1. 3222193 
1.3424227 
1.3617278 
1.3802112 
1.3979400 
*•4149733 
*•4313638 

.02x1893 

.0202034 

.0193051 

.0184834 

.0177288 

•°i 7°333 

.0163905 

—.0000859 
—.0008983 
— .0008217 
—.0007546 
— .0006955 
— .0006428 

+.0000876 
+ 0000766 
+.0000671 
+.0000591 
+ .0000527 

— .00001 10 

— 0000095 
— .0000080 . 

— .0000064 

+.ooooo’$ 

+.0000015 

+.00000x6 


Here the successive differences diminish regularly and the 
sixth difference is not greater than -ooooooi. 

In applications to statistics we do not in general know 
the function and we have to assume that it exists and can be 
expanded in a series whose convergence is sufficiently rapid 
to allow us to neglect all terms after, say, the fifth, or, put 
less accurately, we assume that the causes which produce the 
totals have effects which change gradually from point to 
point, so that the variation of these changes is but slight over 
a small region. 

4. Let y 0 , . . . y n be the values of y which correspond to 

equally spaced values of %, viz. x 0} x 0 + h t x 0 -f 2 h . . . -f- nh. 

Then the coefficients in equation ( 0 ) can be determined, but 
the arithmetic work is very arduous, and a more useful form 
is obtained in terms of differences. 


Consider the equation — 
y=y.+ - 7r Ao 1 +- T - 


X — Xo — h 2 , X — Xo 


x—Xo—h 
2 h 


X~Xo — 2h. , . — ; — . , % 

A o 3 + ... to n+ 1 terms . . . (*) 

(Newton's formula) 

If %=*o, y=y*. 

If x=x 0 +h, y=yo+Ao 1 = y 1 . 

If x = Xo+2h, y=yo+2Ao 1 -f Ao 2 = y 2 . 

If x=Xo-\-rh, y=y 0 +r . A o x + ^~ ~ A 0 2 + to r+i terms, the 
subsequent terms vanishing, and therefore by equation (y), y=y r . 
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Hence (*), which is easily seen to be of the n th degree, is 
satisfied by the n pairs of values in question. * 

E.g., to find y= log 20*5 from the table above. 

x^20, A= i, x=xo+- 5, 70=1-3010300, A0W0211893, etc. 
7=1*3010300+ *5 °f *0211893 + J (-5) (—*5) (—*0009859) +J(*5)(— *5) 
( — 1 ‘5) of *oooo876+ y V( , 5)(~'5)(~i , 5)(-“2‘5)(-*ooooiio) 
+t 1 cj(' 5)( — *5) ( — 1*5) (—2*5) (—3*5) of *0000015. 

Using the first two terms, we have y= 1-3116247. 

„ „ three „ „ 13117479. 

. » » four „ „ x.3117534. 

.. • „ five „ „ 1-3117538. 

„ all terms „ „ 1-3117538. 

The true value is 1-3117539. 

Applications to statistical data are given below, p. 233. 

5. Conversely, if we know y, we have an equation for x, 
which can be solved by Homer’s method or otherwise. 

Thus to determine the median using four observations we 
may proceed as follows. Let there be y„, y v y g , y 3 persons 
whose wages are less than x 0 , x 0 -f- h, x 0 + 2 h, x 0 -f 3 h units 
respectively, and let there be (2 y m — 1) persons all together, 
so that the value of x, x m , corresponding to y m is the median. 

Then ym=y<H ^ — Ao 1 -! ^ ^ A„ J 

. Xm — Xo Xm — Xo — h Xm — X 0 — 2h A - 

h ’ ah ’ 3 h 

a cubic equation to determine x m . 

— We are free, of course, to take as the beginning of any 
grade we please, and it should be so chosen that the median 
is in the* central grade included in the interpolation. Thus if 
we use the cubic equation just written the grade x 0 + A to 
x 0 + 2 A should be that containing the median. 

The formula on p. 107 (2) is obtained by neglecting the 2 nd 
and higher differences, and taking the grade x 0 to x 0 + A to 

•include the median. Then y m = y 0 + Xm ~~~~ x °(y l — y o ) t and 

therefore — 

Xm=:Xo+ ym — y ° . A. 

yi-yo 

To find thf* mode we again take y as the cumulative number 

Q 2 * 
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up to the value x. It is found that it is simplest and generally 
sufficienf to depend on four observations, such that the mode 
is between the second and the third. We thep use the first 
four terms of equation («) and find for what value of x the 
curve is steepest and therefore the" number of cases per unit 
of the abscissa is greatest. D,y is to be a max. mum, and 
therefore D,*y zero. 


Hence x=x«+h— =x»+A+ 7 ( M a 

'A . 3 (« 1 -u 1 ) + (m 2 -mJ’ 


where u v u t , are written for y x —yo, y a — y x , y a — y 2 and are the 
number of cases between Xo and Xo+h , x<>+h and x 0 +2 h, and 
Xo+2h and Xo+$h respectively. If the mode is in the second 
grade u t >u x and u 2 >u 9 . The formula shows how the interval 
Xo+h to Xo+2h is to be divided to obtain the position of the 
mode (see p. ioo). 

Here the fourth differences of the y’s, that is the third differences 
of the u* s, are neglected. 

< 

6. Central Differences . — In interpolation we generally have 
to depend on those values of y with regard to which the region 
where we wish to ascertain values is centrally situated, and 
formula (0) is in some respects awkward for that purpose. 
Equivalent formulae, which avoid the want of symmetry, have 
been devised, in which so-called " central differences " are used. 
No new principle is involved, for these formulae are obtain- 
able by transformation from (0). The differences hitherto 
used may be distinguished as “ ascending differences." 

A suitable notation is as follows : — 


X — j— 2fo 

y-t 

x-^xo—h 

y-i 

Xo 

y. 

x x ~xo+h 

Vi 

x t ~Xo+2h 

y* 

x t =x 0 +3h 



* 

*» 



**. 





Here 8i=yj— y.; 8*.=8j— 8-1^— 2 y»+y- 1 ] &> < =y 1 — 4y x +6y. 
-W-i+y-v etc. 

Let the value of x for which a value of y is to be found divide 
the interval u to x l in the ratio p : q, so that x=x»-\-ph=x l —qh 
and p+j=»i. 
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Then it will be found by substitution that the formula — 

y*=pyi+qy»-\pq{ (p+ i)**i+ (?+ 1)8.*} 

. . . (X) 

which (by writing q=i—p) is seen to be a rational integral function 
, of the 5 th degree in p, and similarly in q, is satisfied by the six 
pairs of values (x- s y- 2 ) (*-! y-,) . . . (*, y 3 ) ; while, if the term 
involving the 4 th differences is omitted, the four pairs (x- 1 y~ l ) 
•••(** V%) satisfy it. 

As an illustration of the notation we may.write y x = log 23 
in the table on p. 226, and taking p = *2 calculate log 22*2. 

Log 22‘S= , 2 log 23+ '8 log 22 

— ^?{r2 of (—-0008217)4-1-8 of (—-0008983)} 

+ - l6X ^ X - 1 ~ ^ i 2 ' 2 of (—-000095)4-2-8 of (— -oooono)} 

= 1 -3462837 4- -0000694 — -0000002 = I -3463529. 

The true value is 1 3463530. 


-The importance of the formula is, however, more apparent 
whqp we have no general algebraic function, but wish to 
interpolate from neighbouring values only. 


7. Lagrange’s Formula . — The formulae (£) (n) ( 0 ) (<c) and (A) 
all relate to the case where the observed values of x are equi- 
distant each from the next. There is no such simple method 
of interpolation where the distances are not equal. An equation 
is given by Lagrange which is of the w th degree and satisfied 
the n-fi pairs of values (x, y 0 ), (x t y x ) . . . (x n y n ) what- 
ever the relation between the x’s may be, and it may be written 
as follows : — 


y~y* 


(x-XiH*-*,) . . 
(*„-*!)(%,-*,) . . 


+ 


• • +y« 


• (*-*») , (*-*»)(*-*«) . ■ 

. (x»—Xn)' Tyi {x l —Xo)(x l — X t ) . . 
(x— *«)(* — Xj) ■ ■ ■' (X— x»-i) 
(Xn— X»)(Xn — *|) . . . (x»—Xn-i) 


• (*-*<») 

• (*1 *») 


• • (m) 


, The numerator in any fraction, say the multiplier of y,, is 
obtained by multiplying the factors (x — x„) (x — x x ) . . . 
(x — x n ) omitting x — x, ; the denominator is obtained from 
the numerator by writing x, for x. 

It is evident that when x — x, every fraction is zero except 
the multiplier of y u which is unity, and therefore y = y,. 
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8. We may now reconsider the assumptions made when we 
took equation (0) to express the relation between y and x . 

If y and * are connected by any functional law, that is if 
y is determinate for all given values of x , without which assump- 
tion most problems of interpolation are meaningless, then y 
can be expressed as a function of x , say y = /(*). If the 
function and its derivatives are continuous then by Maclaurin’s 
Theorem — 

+ . . . continued indefinitely. 

If / n+1 (o) an d following coefficients are very small, and x 
is never large, the terms from the n + 2 nd onwards become 
negligible in comparison with earlier terms, so that the first 
n - f- I terms determine the value of y approximately. Now 
by the equations (i) and (ii), p. 224, / n+1 is small when A n+1 > 
A n+a , . . . are small, and vice versa. Hence we have the 
following general statement : any functional relation between 
y and x reduces to the parabolic equation of the n th degree (0), 
if the differences of orders higher than the n ^ vanish, and 
if these differences do not vanish but are small, equatiolx (0) 
is still an approximate expression for the relation. 

Now if the line drawn through the given points is to have 
continuous and slowly changing curvature, it is easily verified 
that the second differences for points near together are not 
large, for a rapid change in the rate of increase of the ordinate 
means a rapid change of curvature; and if we construct a 
second curve with the same abscissae and the first differences 
as ordinates, small third differences will indicate absence of 
rapid change in the first, and so on ; but beyond this point 
it is not easy to see the connection between the hypothesis 
underlying interpolation and the diminution of successive 
differences. The converse, however, is clearer; if in any 
series of figures it is found experimentally that the successive 
differences tend to disappear, then any curve which passes 
through the points is expressed approximately by the para- 
bolic equation. Do Morgan states this conclusion thus : — 
14 If we take n points near each other, and having their abscissae 
in arithmetic progression, with a small or at least not very 
large common difference, and their ordinates not very unequal 
, . . the parabola of the n — I th order will very nearly coincide 
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with any regular curve of the same general appearance, at 
least between the same points.” Boole's explanatifin is : — 
“It is customary to assume for the general expression of the 
vSiues under consideration a rational and integral function 
of x , and to determine the constants by the given conditions. 

’This assumption rests upon the supposition (a supposition, 
however, actually verified in the case of all tabulated functions*) 
thaf the successive orders of differences rapidly diminish/ 1 
Since, from equation (i), p. 224, when h is small, the in- 
fluences of the successive differences for any curve are smaller 
as \heir o/der becomes higher, it is a legitimate process to 
build up a series of values of any function on the hypothesis 
that the higher differences vanish. 

If a freehand curve is drawn so as to pass through the 
chosen fixed points, and to have curvature which changes as 
slowly as possible, a line will be obtained which lies very near 
that given by equation ( 0 ). Such a line would be similar to 
the track of a bicyclist who was riding so as to pass over several 
marks, or just to avoid several obstacles. 

q. It is clear from the above analysis that we can make a 
smooth continuous curve pass through any number of points 
we please; for with the parabolic equation ( 0 ) there are never 

any sudden jumps in the values of y, ^ or as x changes 

continuously ; and we can obtain as many linear equations 
(which have always real values) as there are constants, simply by 
taking n in the original equation to be the number of fixed points. 

_ If we have, let us say, xo points, as — 



• That is mathematical functions such as I e dx t not statistical 

J Q > 

approximations. • 
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and wish to find a point on a fixed vertical line between F and 
G, we can either take only f and G into consideration, and, 
joining them by a straight line, obtain the point x 1 ; or con- 
sidering E, F, and G, or F, G, and h, draw parabolas and obtdin 
x t or x t ; or considering E, F, G, and h, draw a parabola of the 
third order, which would have a point of inflexion near f; 
this would be approximately the path a bicyclist might follow 
if he had to start from E, and ride to a near point h, passing 
close to F and G. If we now include d and K (if our bicyclist 
has to start from D, pass e, f, g, and h, and reach k) we shall 
modify the curvature throughout; and as we include more 
and more points shall continue to affect slightly the path F G. 
If the inclusion of the nearer points tends to make the line F G 
approximate more and more closely to a final position, while 
the further inclusion of the more distant points throws it 
further away, we may conclude that the positions of these 
further points are not governed by the same numerical con- 
ditions as the nearer one. Thus in a “ table of survivals ” 
the figures for ages under 5 years are not distributed in 
accordance with the curve determined by the figures for higher 
ages ; in a table showing wages, it may be seen that those of 
highly paid workmen are not governed by the same causes as 
those lower in the scale. On the other hand, the number in 
each census is dependent on all the previous numbers for more 
than one generation. In interpolating for the population of 
1876 we shall obtain different figures according as we include 
1851, ’61, ’71, ’81, ’91 only, or 1901 as well ; and this is not 
surprising, for a mistake made in 1876 may not come to light 
till we have watched the growth of the population for twenty- 
five years. It is clear that the points far from the period in 
which the interpolation is to be done cannot be allowed so 
much influence as those nearer, and it appears experimentally 
that this condition is fulfilled in the method discussed ; also, 
in series (*) the successive coefficients begin to diminish with 
the r* term where x < x, + (2 r — 3 )h, that is with the co- 
efficient of the first difference when x is between x„ and x 0 -f h. 
It may be noticed that the wanderings of the curve are limited 
by the condition that a curve of the n — I th order cannot have 
__ 

more than n — 3 points of inflexion, for has no term of a 
higher degree than x n ~ i . 
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In the above illustration the intermediate points from f to 
G might be found from the five points d, e, f, g, h, *or from 

E, F, G, h, K. These two curves may be welded together 
between f and G. f The points near f are more accurately 
determined by the first, of* which it is the middle; those near 
*G by the second. The welding line should touch the first at 

F, the second at G. This is conveniently done by the use of 
the sine curve. This method is employed, I believe, at the 
Registrar-General's office. 

It cannot be said that the present theory of statistical 
interpolation rests on an altogether satisfactory basis.* The 
principles which govern it are not well defined, and the 
mathematical analysis of the methods, by which the principles 
should be brought into relation with the facts, is incomplete. 
Yet it is perhaps unnecessary to labour after more refined 
methods, for interpolation cannot be precise unless we actually 
know the algebraic expression of the laws which govern the 
figures, and the method here discussed is found to satisfy the 
conditions empirically, while further refinements could only 
introduce slight modifications. 

10. Examples showing the Numerical TJse of the Formula . — 
(1) Given the number of wage-earners earning sums in 5s. 
groups, to estimate the number earning as much as 24s. and 
not so much as 25s. 



•Numbers 
per i,ooo 
Wage- 
Earners 
(Adult males) 

Differences. 

1st. 

2nd. 

3rd. 

4th. 

/ 3 ‘ 5 s - 

39 

is 7 




( "1 * os - 

296 

46 



Earning as much 1 £ 25s. 

599 

303 

-98 

- M 4 


■1 ° 

as 10s. 1 1/5 30s. 

804 

205 

-91 

7 

18 

1 ° 


114 


*5 


* I ^ 35 s - 

918 

48 

-66 



V-O 40s. 

966 





• General Report on Wages, (C — 6889; year 1893). 


*. This refhark does not apply to the interpolation in evaluating mathe- 
matical functions. 
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Neglect the increasing differences arising from the number 
earning (ess than 15s. 

Using formula (*), xo= 20 (shillings), /*= 5, >>0=296, Ao 1 =303, 
A<> 2 =— 98, Ao 3 =7 # Ao 4 =i8. • - 

At 25$., y=599, from above table: 


At 24s., *=24, y=2g6-j-^ of 303+^ 1 


IO of (-98) + 

4.Z^.-6 0 f 7 + 4 >= ^.-6 -n ofl8> 

5 10 15 ' 5 10 15 20 

=296+242 , 4+7*84+*224— *3168=546 (nearly). 

The required number is therefore 599—546=51. 

Again at 23s., *=*0 + 3, >>=489, and the number* earning as 
much as 23s. and not so much as 24s. is 58. 


(2) To make an estimate for the value of imports in the 
year 1813, the records for which were destroyed by fire. 

Given value of imports in — 


1810 - 

- £39,202,000 - 

- yv 

1811 - 

- 26,510,000 - 

- y«- 

1812 - 

- 26,163,000 - 

- y* 

1813 - 

- 

- 

1814 - 

- 33,755,000 - 

- >v 

1815 - 

- 32,987,000^ - 

- Vt- 

1816 - 

- 27,431,000 - 

- y 7 . 


From formulae 67), using y 3 and y 6 only, and’ assuming that 
2 nd differences vanish, 

y6-2y 4 +y 3 =o, y4=29,95Q. 

From formulae (rj), using y 2 and y 6 as well, and assuming that 
4 th differences vanish, 

ye+y 2 - 4 (y 5 +y 3 )+ 6 y 4 =(?, ^=30,020. 

From formulae (>?), using y x and y 7 as well, and assuming that 
6 th differences vanish, 

y7+yi-6(y 6 +y 2 )4-i5(y5+y 3 )- 2 °y4 ::= ^» y 4 =3 f M 21 . 


Here the first and second values are very near together, 
while the third differs ; hence we adopt £30,000,000 as the 
value required. 

(3) In Mr. Booth's Life and Labour of the People , e.g., 
Vol. V, p. 46, a series of very useful diagrams is given showing 
the age distribution of various classes. The figures he uses 
are as follows : — 
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Ages 

Proportion 
occupied per 
10,000 of total 
aged 10-80. 

Average at 
each year of 
age between 
given limits. 

10-15 years -♦ 

- 193-5 

38-7 

15-20 „ - - • - 

- 880 

176 

20-25 „ 

- 933 

186-6 

25-35 .. 

- 1636 

1636 

35-45 

- 1201 

120- 1 

45-55 » 

830 

83 

55+5 „ - 

- 434 

43-4 

65-80 „ 

- 192-5 

12-8 


His diagram is drawn from the last column, the numbers in 
which form the ordinates for the middle of the corresponding 
age periods. The points so obtained are joined by straight 
lines. This method is sufficiently accurate for his purpose, 
but it will afford an interesting example of interpolation if we 
obtain some of the figures for intermediate years more closely. 


Age. 

Proportion occupied 
per 10,000 under 
x years. 

15— *1 - 

- I93'5=yi 

20=X 2 - 

- I073'5=y 2 

f 25 =* 3 - 

’ 35 =*« - 

- 2oo6-5=y 3 

- 3642-5=y« 

45— *5 - 

- 4843'5=y 5 

55= r « * 

- 5673"5=y« ' 

65=^7 * 

- 6107-5=7, 

8o=x 8 - 

- 6300 —y t 


Use Lagrange’s formula (//) to determine the number under 
■50 years, ignoring persons over 55. Thus x = 30. 

i°- 5 (- 5 )(— I 5 H- 25 ) 


y= 193-5 X 


(— 5 )(- 


I0 )(— 20)(-30)( — 40) 

15 - 5(— 5)(— 15)(— 25) 


+ 1073-5 x 


+2006-5 X 


+48435 X 


=2879. 


i5.io(-5)(-i5)(-25) 

10 • 5 ( — I0 ) ( — 20 ) ( — 3°) 


5 (-5) (-15) (-25) (-35) 


+3642-5 x 

i 5- 10 • 5 ( — 5 ) ( -25) 

30 . 25 . 20 . I0(— IO) 


* 5 • 10 • 5(— 15)(— 2 5) 

20 . 15 . I0( — I0)( — 20) 


+56735 X 


15. 10. 5(— 5) (-15) 
4° • 35 . 30 • 20 . 10 
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Mr. Booth’s diagram gives 2824-5 for the same position, 
using y{ + y 4 only. 

If in the formula the quantities y 2 ,y 8 ,y 4 ,y 8 only are used, 
y is found to be 2869. ' * r 

Lagrange’s formula as used above is equivalent to the 
assumption that the 6 th differences vanish when the ages are 
uniformly graded. Write a, b, c for the values of y at 30, 40 
and 50 years. 

Using formula (/S) or (rj) for the values y lt y 2 , y 8 , a, y t „b, 
y, we have y x — 6y a + I5y, — 20a + I5y 4 — 6b + y 6 = 0, and 
similarly 

yi-6y 8 + i5<*-2oy 4 + i5&-6y 5 +c=o 
and y 8 -6a+i5y 4 — 206+1575— 6c+y„=o. 


Whence by straightforward solution a = 2879 as above. 
This method, when applicable, is simpler than Lagrange’s 
formula. 

(4) As an example of the determination of the median 
and the mode, we will use the figures already employed on 
p. 69, which may be retabulated thus : — 


Earning 
less than 

X. 

y- 



^$•25 

•75 

125 

— I 

0 

1 

0 

317 

1789 

317 

1472 

1297 

970 

506 

1157 

-175 

*•75 

2 

3086 

-327 

2.25 

3 

4056 

-464 

2-75 

4 

4562 



Differences. 


-1332 

-137 


+15 


The whole number of persons is 5123. To find the median 
put y=2562, and use the entries from x=o to *=4. 

Then 2562=317+ 1472*— Jof ij$x{x— 1)— Jof i$ 2 x(x—i)(x—i) 

+tt of i5*(*-i)(*-2)(*-3). 

if we stop at the 4 th difference. 

61488= 7608 + 36 1 22* — 1 1 ix* — 698**+ 15X 4 , and the solution 
by Homer’s method is x= 1-5715. 

Hence the median is at $.75+1-57x5 of -50=11.536. 

Another method is to suppose x expressed as a function of y*. 
and to write Lagrange’s formula — 


r (y-y i) (y-y») (y-y») „ 
(y«-yi)(y»-y 2 )(yo-y3) * 


+ 


+ 


+. 


• Cf. Edgeworth in the Statistical Journal, 1898,1?. 698. 



INTERPOLATION 


#237 


If we use four entries only in the above table, we have — 
(256a- 1789) (2562-3086) (2562-4056) * , 

— 1472x^-2769 x -3739 + + +» 

whence x= 1*5624 and* the median is $1,531. 

* 

This method is suitable for working on a calculating 
machine. 

To find the mode use the entries from x = — i to x = 2. 

The second and third differences in the formula of p. 228 
are now 1157 and — 1332. 

The required value is $.75 -f- — of *50 = $1.18. 

• I 33 2 

Variations of method can be used, leading to slightly 
different results. The mode is, in fact, not precisely determinate 
when the grading is so wide and the higher differences do not 
tend to zero. 

This method is applicable to such problems as the deter- 
mination of the date at which the population, the marriage, 
birth, and death rates, etc., increased most rapidly; at what 
age the chance of death increases most, etc.* 

M. An important group of problems of interpolation arise 
when the original returns have to be corrected, e.g., the deter- 
mination of the distribution by age from the census returns. 

We have now the problem of drawing a smooth line in the 
neighbourhood of a great number of points, but not necessarily 
through any of them. The assumption is that the returns are 
insufficient in number or deficient in accuracy, and that they 
indicate a regular distribution which it is required to represent. 

(1) One method is to assume that the averages over fairly 
large groups are accurate, and to these averages to apply any 
of the methods already discussed. 

(2) A second method has been used in the section in which 
various curves were smoothed (vide supra, Chapter VII). This 
may be restated as follows : — Take successive groups of 2, or 
3, or 4 .... 10 points, beginning again and again at the 
ordinates for each of the given abscissae. Find the centres of 

‘gfavity of each group ; that is, erect am ordinate equal to the 
average of the ordinates of a group at the point half-way 
between the ends of the abscissae of the outside ordinates of the 
group. Draw a line through the points so obtained. It will 

* Cf. Edgewojth, in Statistical Journal, 1899, p. 381, and the references 
there given. 
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be found that this line satisfies all the conditions laid down. An 
example of this method is given in the diagram facing p. 134. 

(3) In another method the original figures are smoothed 
till the differences of the fourth or fifth or higher orders vanish ; 
and then the ordinary formulae of interpolation are applied. 
Thus in example 1, on p. 233, rewrite the table thus : — 


Wages 
above 15s. 

Smoothed 

Numbers. 

Corrected Difference*. 

Up to 20s. 

>* 25s. 

.9 30S. 

9. 35S. 

•9 40S. 

296 

599 + a 

804 +• a + b 
918 

966 

ISt. 

3°3 + a 

205 + & 

1 14 - a - b 
48 

2nd. 

- 98 - a + b 

- 9 1 - a - ib 

- 66 + a 4* b 

3rd. . 

7-3^ 

25 + 2 a+i,b 


If wc put b = 2$, a = — 16, the third differences vanish, 
and we have A * 1 = 287, A 0 2 = — 79§, A 0 8 =■= A^ «= 0 ; when 
x = 25, y = 583, and when 

x = 2 4» y — 2 96 + t Of 287 — A Of (— 79I) = 531*97 
so that the number earning as much as 24s. and not so much as 
25s. is now found to be 51, instead of 52. 

The corrections may be applied to any of the original figures. 
We need to solve only one more equation to complete our 
table from 20s. to 30s. 

When x = 23, y = 296 -f I of 287 + ^ of 79$. The 
difference between this and the value of y , when * = 24, is 
i of 287 — ^ of 79 1 = 54*2i. 

We have therefore the following table, where the figures* 
in italics have already been calculated, while the others are 
added on the assumption that the third differences are zero. 


Wages. 

Numbers. 

Up to 20s. 

296 

>1 21s. 

360 

ft 22s. 

420 

„ 23s. 

478 

„ 24s. 

532 

„ 25s. 

5*3 

„ 26s. 

631 

„ 27s. 

676 

„ 28s. 

717 

„ 29s. 

755 

„ 30s. 

790 


Differences. 


6375 

60-57 

57"39 

54 ' 2 i 

52-03 

47'85 

44^7 

4f49 

38-31 

3513 


2nd. 
3*i 8 

3*18 

3*18 

3*18 

3*18 

3 *i 8 

3 *i 8 

3**8 

3*18 


3rd. 


o 

o 

o 

o 

o 

o 
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If we had taken the second differences more exactly, we 
should have obtained 804 + a -f b — 790J for the last figure 
as in the previous table. 

WThis method of waiting down many figures when the signifi- 
cant differences have been found can be very generally applied 
also in the cases where the data are exact. 

(4) Another method, involving higher mathematics, would 
be discussed more suitably after the section devoted to the 
law of error ; a brief explanation with a useful formula may, 
however, be offered here. 

Suppose we have five consecutive points (— 2, y_ 2 ), 
(— I, y-i),‘(o, y), (1, y,), (2, y t ) given. 

A parabola of the fourth order could be drawn through these 
five points, but would have two points of inflexion. A great 
number of parabolas of the third order can be drawn near all 
the points, having no points of inflexion, and satisfying all the 
ordinary conditions of interpolation. 

Borrowing a principle from the method of least squares,* 
we assume that if the coefficients of the parabola 

• y = a 4- bx -f* cx 2 + dx 3 

are chosen so as to make the quantity 

+ bx -f- cx 2 -f dx z — y) 2 

(where the summation extends over the five years of values of 
x and y) a minimum, the parabola so determined will be the 
best for the purpose. 

For the necessary mathematical analysis, Professor Darwin's 
jpaper On Fallible Measures , j from which this method is taken, 
should be consulted. 

The following equation is obtained — 
a = y 0 — ^ x Ao 4 , where A 0 4 is the difference of the fourth 
order for the y's. 

Now replace the point ( 0 , y) by the intersection of its 
ordinate with the parabola, that is by ( 0 , a), where a has the 
value just given, that is, diminish y by the quantity -fc . A* 4 . 

* • Repeat the same process for each point on the original line, 
taking it as the middle of a group of 5, and a smooth curve 
lying very near all the original points is obtained. 

Tlius we may smooth line C in diagram facing p. 146. 


* See Part II, Appendix, Note 10. 
t See Phil. Mag. and Journal , July 1877. 
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Imported 
Wheat pe # r head 
of the 
Population. 

Difference*. 

Smoothed Figures. 

' a 

lbs. 

1890 226 

1891 244 

1892 245 

1893 248 

1894 256 

1895 285 

1896 257 

1897 228 

1898 238 

18 

2 

3 

8 

29 
-28 
-29 
+ 10 

-17 

2 

5 

2 1 

-57 
— 1 

39 

■ 

19 

3 

16 

-78 

56 

40 

-16 

13 

-94 
134 
— 16 

2 45+*V of 16=246$ 
248-/1 of 13=247 

2 5 6 + 3V of 94=264 

285 — /*■ of 134=263$ 
2 57+/r °f 16=258$ 


The statistics of wheat consumption are inexact because 
of the variation of the stocks at the end of each year, of which 
no record was available. Hence it is reasonable to regard the 
numbers as subject to amendment and smooth off irregularities. 

(5) A more general problem of interpolation is to find an 
algebraic formula, other .than the parabolic equation so far 
used, which expresses a whole series or group. A short intro- 
duction to such formula will be found in Part II, Chap. V, 
below. 


Note. — Formula (\) is due to Professor Everett, who gave the general 
term and proof {Quarterly Journal of Pure and Applied Mathematics , No. 128, 
1901, formula G). A proof can be obtained as follows : — 

If /(*) =cosh^20 sinh- 1 ^, it is readily shown that f n + % (o) — {q t — ln % )t n (o), 
and thence by Maclaurin's Theorem the expansion of f{x) is — 

— • . . =cosh^2j sinh*|y 
After differentiating and dividing by qx we obtain — 


? + 0 (?’ - 1 )** + (?*- »*) (?’ ~ **)**■ + 


— wh* sinh ( 2? sinh, f) 

sinh(?AD) . ,(hV\ 

sinh(AD) ’ where sin HT ) 


In the notation of p. 228, (** D — 2+e-u>)y 0 —( e ™, 

the operator sin 


M>\1 

a / >’•> so that 


since P+q=*i, 




y+ 


y*-eJ*v{y 0 ) and y 1 =e* D (y„). 

==, identically, 

= - f*D + -p*D) e hD} + ( e KD^. $ hD) 9 

_smh {qhD) t s\nh{phD) 

~ smF(AD]T + sinh(AD) 6 * 
sinh(^AD) t sinh(/>AD) 

~ ainh(AD) y,+ sinh(AD) y »* 
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k in the above series is identified as 5, and we have, using the series first 
as expressing operators on y tt and secondly (after p is written tfor q) as 
expressing operators on y v 

^p»=?y«+jl? (?*—*)*•* 

+pyi+j- l P(P‘- 1 )*,*+ 1 

that is formula (a) generalized. 

For further information on the subject of interpolation, the reader is 
referred to Dr. Farr's Life Table (No. 3), 1864, Boole's Finite Differences, 
Text-Book of Institute of Actuaries , Part II., p. 420 $eq ., Rice's Theory and 
Practice of Interpolation, 1899, Merrifield On Quadratures and Interpolation 
(British Association Report, 1880), Chauvenet's Spherical and Practical 
Astronomy (Chap. II.), Woolhouse in the Assurance Magazine (Vols. XI., 
XII.), Professor J. D. Everett On the Algebra of Difference Tables (Quarterly 
Journal of Mathematics, No. 124, 1900), On a Central-difference Interpola- 
tion Formula (British Association Report, 1900), and in the Journal of the 
Institute of Actuaries, January 1901, and Dr. W. F. Sheppard's Papers On 
Central Difference Formulce (Proceedings of the London Mathematical Society, 
Vol. XXXI., Nos. 707-710), and On the Use of Auxiliary Curves in Statistics 
of Continuous Variation (Statistical Journal, September 1900). In these 
other references will be found. 


r#* 
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PART II 


APPLICATIONS OF MATHEMATICS TO 
STATISTICS. 


CHAPTER I. 

INTRODUCTORY. FREQUENCY CURVES. 

Introductory. 

Mathematical processes are essential in very many parts 
of the statistical field, and in the first part of this book 
algebraic methods have been used for the generalisation of 
arithmetical results and for the simpler cases of interpolation. 
There are, however, many classes of problems which necessitate 
mathematical treatment of a rather special nature, and it is 
to the consideration of some of these that this second part is 
devoted. The whole field is too wide to cover, and selection 
has been made of those methods which are fundamental and 
of those problems which are of direct interest to students of 
political economy and allied sciences. Essentially the same 
methods are needed for statistical problems in medicine, 
biology and other sciences, and their use can be followed in 
the appropriate journals. Here it has seemed best to keep, 
as a general principle, to those questions which have arisen 
in connection with economic and social investigation, and to 
take examples mainly from this limited region. 

* “So far as the manifold and diverse applications can be 
classified, they fall into three groups : (i) the systematic 
description of groups, (2) the measurement of relationship 
between phenomena, (3) the measurement of the precision of 
results obtained by a process of sampling. The background 
of the great pari of the relevant analysis is the theory of chance, 

. 245 
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carried to a point which is reached only by the relatively small 
number of mathematicians who have specialised in that 
subject. Since it cannot be assumed that readers are familiar 
with any but the simpler cases of algebraic probability * 'and 
there are no familiar text-books in English to which reference 
can be made, it has been necessary to devote a good deal oi 
space to purely mathematical treatment ; but an effort has 
been made to render the treatment intelligible to those who 
have had some mathematical training, but are not specialists 
in the subject. Thus where possible the proofs have been 
given without the use of the Infinitesimal Calculus;' the 
results have been stated as clearly as possible in words and 
illustrated by arithmetical examples ; the simplest cases have 
been dealt with first to elucidate the processes and results, 
while the more general treatment has been given in outline 
with reference to papers or journals where a complete analysis 
has been found. Non-mathematical readers are recommended 
to omit the parts printed in small type. In the Appendix 
are collected some theorems whose proofs are not elsewhere 
very easily accessible, and to it are relegated some parts of 
the analysis which are too unwieldy for the text. 


Frequency Groups and Curves. 


The remainder of this chapter is devoted to the systematic 
measurement of frequency groups. 


P. 



O M| Mj M3 


X 


Let there be any group of measurements such that, an axis Ox 
being taken on which a scale is marked, y l instances are found to 
have the measurement y t the measurement x t , and so on ; then 
the group can be represented as in the diagram, where OM t = x t , 
MiPt=yi. etc. 


• For elementary treatment, see Whitworth’* Ckoics and Chancs. 
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It b not necessary that the grades M x M tt M t M t . . . should 
be equal. • 

Let n be the whole number in the group, so that 

i 

* n*^+y t + . . . 

Then the 41 frequencies ” of observations at x v x % , etc., are 


y± y% 

n * n' 


etc. 


If the points P x , P f , P f . . . can be regarded as lying on a 
continuous curve, then their locus is a 44 frequency curve/' 

If* the measurements do not fall into grades, or in sub-groups 
at particular values, but each observation has a distinctive 
measurement, then the group can be represented by a loaded axis 
on which each item is marked by a dot, 


» ■• ■ • IM # ♦ >1 # ♦ ■« 

O X 

and a great part of the following formulae is applicable to such 
a loaded line as well as to a frequency curve. 

Upasurements of the members of a group are frequently massed 
in grades (as 20-25, 25-30, • • • years) or originally made to the 
nearest unit (as 55-56, 56-57 . . . inches). In such cases the 
number in each grade is approximately represented by a rectangle 
(as on the grade MjM,). 


O 


Qi 


M, 


<?2 




Mo M, 


X 


Let A be the breadth of each grade, x x , *, . . . the abscissae of 
their middle points, y v the altitudes of the rectangles, 

amd y x h, y t h , . . the numbers recorded in the grades. Then 

n*=y 1 h+y t h+ . . . , =yi+y,+ ■ . . 

if A is taken as the unit. 

The frequencies in the grades are 


... etc. 

n n 


S248^ elements of statistics 

If a continuous curve can be defined and constructed so that 
the parts of its area standing on M X M,, M t M s ... are proportional 
to y x h, y t h . . . , then this is the frequency curve of the group. 

Variation is a general law of nature and is found in most 
human affairs, so that large scale observations usually lead to 
frequency groups. Four classes can be distinguished : (a) where 
every member of a group has been measured, e.g . the 
wages of every adult male working in a trade ; (6) observa- 
tions of samples selected from a group, e.g. the number of 
children in each of 1,000 families chosen in a town where there 
are 50,000 families, or the measurement of leaves pf a tree of 
a particular kind ; (c) repeated measurements of a physical 
quantity {e.g. of the declination of a star) where the variations 
are due to instrumental errors ; {d) the mathematical proba- 
bilities of various numbers of successes [e.g. the chances of 
obtaining 1, 2, 3 . . . heads when 50 coins are tossed) or the 
frequencies of events whose magnitude depends on an unknown 
complex of causes. 

To whichever class the phenomena belong, the same general 
method of describing the group is appropriate. This method 
is to select certain algebraic functions of the x's and y's and 
to evaluate them for the particular group. The group is in 
fact described (1) by determining a central position, (2) by 
measuring the dispersion of the observations from this centre, 

(3) by measuring any want of symmetry about its centre, 

(4) by further measurements depending on the shape of the 
diagram which represents the group. 

For the central position we can use the arithmetic average, 
the median, the mode or, in some cases, the geometric mean7 
The arithmetic average is necessary in most cases in further 
calculations and must be taken as the usual starting point. 
The median does not lend itself readily to general algebraic 
work, is not always known precisely, and need only be calcu- 
lated for special purposes. The mode is not generally deter- 
minable exactly from the observations and the introduction 
of approximation at the beginning of the calculations should 
be avoided ; if, however,, we have a definite algebraic formula 
for the group, the mode can be exactly obtained and is often 
important. (Part I, Chapter V.) 

For measurement of dispersion we may use the “ probable 
error/' i.e., the half -interquartile range, or the rhean deviation, 
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or the deviation of mean square. Of these the probable error, 
like the median, can often only be found approximately and 
is dif&cult to use systematically in further measurements. 
The mean deviation? apart from an ambiguity in the position 
of the origin from which the deviations are to be measured, 
.introduces in further work a serious difficulty because the 
first measurements are taken irrespective of their sign. The 
deviation of mean square on the other hand is free from all 
these difficulties, being defined uniquely as the square root 
of the average of the squares of the deviations of single measure- 
ments from their average, and not only is easy to handle 
algebraically, but also necessarily enters into many calcula- 
tions. It is called the standard deviation and is universally 
used in mathematical statistics. (Part I, Chapter VI.) 

Want of symmetry in a curve is indicated by the want of 
coincidence of the median, mode and arithmetic average, and 
by inequality of the distances from the median to the lower 
and upper quartiles. On any such quantities, which are zero 
when the group is symmetrical, a measurement can be based ; 
but the median, mode and quartiles can often only be found 
approximately, a resulting measurement is specially subject 
to any imperfections resulting from paucity of observations, 
and a change in magnitude of an observation has no influence 
if it does not transfer it across the median or a quartile. 

We need a measurement which is sensitive to the position 
of every observation. It would be possible to take the differ? 
ence between the mean deviations of observations above and 
observations below the average, but this would not lead to 
a formula readily put in line with other systematic measure- 
ments. It is found that the deviation of mean cube (the 
average of the third powers of the deviations of observations 
from their average, taken positively or negatively as they 
occur) is free from all difficulties, and it is evidently sensitive 
to all want of symmetry or " skewness.” 

In measuring deviation it is natural and usual to express 
^the result in concrete terms as so many inches, lbs., or other 
units, and the standard deviation, probable error, and mean 
deviation are so expressed. But in measuring skewness there 
is no obvious concrete unit and it is convenient to construct 
the measurement so as to be independent of the unit used ; 
this is obtained by expressing the deviation of each observa- 
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tion from the average as a multiple of the standard deviation ; 
thus if x*is a measurement, x the average, and a the standard 

deviation, the quantities averaged are / * ~ ~~ ) » tjie 

resulting measurement of skewness is ^ £sum of all values < 

of TW 3 evidently gives a sensitive measurement, 

but on no obvious scale, and it is only by experience of the 
shapes of curves and the resulting measures of skewness that 
these measures acquire an intelligible meaning. 

Further measurements can be obtained from the mean 
fourth, fifth, and higher powers. These have been generalised 
by Professor Karl Pearson in his system of moments. The 
ist, 2nd, 3rd . . . moments are the mean of the first, second, third 
. . . powers of the deviations ; the deviations may be measured 
from any point and the resulting moments are with respect 
to that point ; but the arithmetic average is generally taken 
as the centre from whichjmeasurements are made, and moments 
with regard to other points are only used to facilitate calculation. 

In Part I, Chapter V, it was explained that an average 
was used as a compact way of describing a group, especially 
when it was desired to compare or contrast two groups. This 
conception has now been developed, and we have a systematic 
way of describing the essential characteristics by three or more 
symbols, which measure the average, the standard deviation, 
the skewness and further analogous quantities. As soon as 
the meanings and scales of these measurements are appreciated, 
we may dispense with the original data (keeping them only 
for reference or as diagrams), express groups in a concentrated 
form, and base calculations showing the relations of groups to 
each other on these quantities which are specially adapted to 
mathematical treatment. 

The system is not of universal applicability, and in 
Chapter V are given examples of other methods suitable for 
particular classes of groups. 
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dotation of Moments. . 

Tile notation an(l nomenclature used here are as follows : — 

+ . • -) = S(*y) * n. . . (i) 

7 * 

is called the P b moment of the group about its origin. 


*“S y (2) 

#»,' = *=* S xy + Sy (3) 

is the average of the group. 

m = S (x — x)<y + n (4) 


is the f h moment about the average. 

Then 

nm t = S (x — x) 2 y =» — 2rS*y + x*Sy = nm t ' — 2x.nx + nx 2 


and 

w, = !»,'-** (5) 

. <r=*Vin t (6) 

is called the standard deviation, as defined above. 

nm, = S x?y — 3*Sx a y + 3x*Sxy — nr* 

Wa = mj — 3-rm,' + 2r* . . . . (7) 


m, is zero in a symmetrical curve. To obtain a convenient 
measurement of want of symmetry or skewness the abscissae are 
expressed as multiples of the standard deviation, thus eliminating 
the concrete unit of measurement. 


Thus 



is a measurement of skewness. 

Similarly, m 4 = m 4 ' — 4*m,' + 6 r^m,' — 3* 4 
is the fourth moment, and 



gives a measurement independent of the unit. 


(8) 

( 9 ) 

(10) 


* This symbol is introduced in this book in place of letters formerly used 
to measure skewness. It is believed that it wifi be found convenient. 
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The standard deviation being given, the more the members 
of the group are dispersed from the centre, the greater is * 2 . 
In the particular case of the normal curve of error (p. 269), 
K t =- 3. If without altering <r the centraTheight is depressed 
and the outlying parts pushed further out than in the normal 
curve, then **<3. 

fH ^ 

Professor Karl Pearson uses so that = * as 

given above, and he and Mr. Yule use a more elaborate formula 
for skewness. Also he writes instead of m u and yS 2 for # 2 . 
Professor Edgeworth, following earlier practice, frequently 
uses c = V 2m 2 (called the modulus), for the unit of reduction 
instead of a, so that c — ayf 2. On the whole the saving of 
complexity in some formulae by the use of c may be held not to 
compensate the use of an additional letter, for in any case the 
standard deviation must be used. 

Edgeworth also uses j for so that k = 2 Vzj, and i for 


^ — 4 = ‘^ (~i — 3) 3 ^ (*a — 3)- Then i is zero in the normal 
curve of error. 


Illustrations of the Calculation of Moments. 

In the following examples methods of calculating the 
essential measurements *, <r, k, k % are given. 

In very few cases has it been found necessary or expedient 
to use higher moments than the fourth for descriptive work, 
and it is well that this is so, for the errors incident to the 
obtaining of higher moments from actual observations are 
generally so considerable as to render them useless. 

1. In the first example a fairly homogeneous group of 
physical measurements is taken, viz., the weights of 3,404 boys 
of nearly the same age. If their heights (given on p. 385) were 
symmetrically distributed, it is to be expected that their weights 
would show a positive skewness, and in fact * = -643. One 
boy of exceptional physique (heights ft. 4 in., weight 14 stones) 
is excluded in the calculation of moments. The curve is not 
far removed from normality, for * 2 — 3 equals 6nly -457. 
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W bight of Boys, 14 to 15 Years of Age, Granted Employment 
Certificate in New York. * 


Weight. • 

Scale. 

Number. 




Products. 



lbs. 

X 

y * 


*y 

**y 




65 - 

- 7 

3 

— 

21 

M 7 

— 

1,029 

7,203 

70 - 

— 6 

9 

— 

54 

324 

— 

1.944 

11,664 

75 - 

- 5 

142 

— 

710 

3,550 

— 

17,750 

88,750 

80 — 

- 4 

301 

— 

1,204 

4,816 

— 

19,264 

77,056 

85- 

- 3 

289 

— 

867 

2,601 

— 

7.803 

23.409 

90— 

— 2 

380 

— 

760 

1,520 

— 

3.040 

6,080 

95 - 

— 1 

416 

— 

416 

416 

— 

416 

416 

100- 

0 

4°4 


— 

0 


— 

0 

105- 

1 

315 

+ 

315 

3 i 5 

+ 

315 

315 

IIO- 

2 

320 

+ 

640 

1,280 

+ 

2,560 

5,120 

II 5 *’ 

3 

262 

+ 

786 

2.358 

+ 

7.074 

21,222 

120- 

*4 

221 

+ 

884 

3.536 

+ 

I 4 ,M 4 

56.576 

125- 

5 

131 

+ 

655 

3.275 

+ 

16,375 

81,875 

130- 

6 

76 

+ 

456 

2.736 

+ 

16,416 

98,496 

135 - 

7 

52 

+ 

364 

2.548 

+ 

17.836 

124,852 

14 9 - 

8 

20 

+ 

160 

1,280 

+ 

10,240 

81,920 

* 45 - 

9 

29 

+ 

261 

2,349 

+ 

21,141 

190,269 

150- 

10 

14 

+ 

140 

1,400 

+ 

14,000 

140,000 

155 - 

II 

10 

+ 

no 

I,2IO 

+ 

13,310 

146,410 

160- 

12 

2 

+ 

24 

288 

+ 

3,456 

41.472 

165- 

13 

2 

+ 

26 

338 

+ 

4.394 

57.122 

170- 


5 

+ 

70 

980 

+ 

13,720 

192,080 

175 - 

15 

1 

+ 

15 

225 

+ 

3.375 

50,625 



3,404 

+ 

4.906 

37,492 

+ 

158,356 

1,502,932 




— 

4.032 


— 

51.246 





+ 

874 


+ 

0 

M 

r^. 

0 



The origin is taken at 102-5, anc * the unit as 5 lbs. 

/erage 102-5 + 

- x*=* 10-948 


m,'=*=- 5 Z±=. -2568 

nt 1 = 0. 

3404 


»,'= — 11-014 

m t — m % 

3404 


, 107110 

'= ------ - 31*466 

m t — Wj 

3404 


, 1502932 

*/- — — r - = 441*519 

m 4 = m 4 

3404 



m 4 = m/ — 4 Am g '+ 6 x*m/— 3**=- 413-542 

m t corrected,* = 10-948— 10-865. <r= 3-296, i.e. 16-48 lbs. 

m 4 corrected,* = m 4 — T + F = 413-542— 5-474+ -029** 408-10 

m. ^ /- m 4 

*■= . = -643*= V0„ k,= - *,= 3'457=0, 


„ m. 

f— 4-661 j =» -^5 =*-227 i* 


•114. 


* Sheppard's corrections, see Appendix, Note 5, p. 439. 

In the above table and in similar calculations it is assumed that the 
numbers in each grade can be treated as if they were all at the centre of the 
grade. Unless the grading is very fine, this exaggerates perceptibly the 
second and fourth moments, while if the numbers in the extreme grades are 
small the first and third are little affected. If the breadth of the grade is h 
and not taken* as unity, the corrected moments are m t — and 
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2 . Sauerbeck's 45 index numbers measure the movement of 
prices o! separate commodities, while their average measures 
the general price movement. The 45 numbers may be regarded 
as measurements of the general movement subject to indi- 
vidual chance deviations, and therefore form a frequency 
group, whose standard deviation can be used to measure the 
precision of the average. The group is moderately unsym- 
metrical. The number of cases is so small that it is not 
worth while to calculate the 4th moment. 


Sauerbeck's Index Numbers of 45 Commodities in 1916. 


K um- 
bers. 

j 

r 

jr* 

jr» 

Num- 

bers. 

j 

V 

jr» 


68 

— 

68 

4,624 

~ 3 M .432 

138 

+ 

2 

4 

8 

c 

— 

65 

4.*25 

— 274,625 

I48 

+ 

12 

M 4 

1,728 

— 

5 * 

2,704 

— 140,608 

148 

+ 

12 

M 4 

1,728 

86 

— 

50 

2,500 

— 125,000 

153 

+ 

\l 

289 

4 . 9*3 

93 

— 

43 

1.849 

- 79,507 

154 

+ 

324 

3 . 83 * 

96 

— 

40 

1,600 

— 64,000 

154 

+ 

18 

324 

5 - 83 * 

100 

— 

36 

1,296 

~ 46,656 

157 

+ 

21 

44 i 

9,261 

100 

— 

36 

1,296 

— 46,656 

159 

+ 

23 

529 

12,167 

IOI 

— 

35 

1.223 

- 42.875 

159 

+ 

23 

529 

12,167 

104 

— 

• 3 * 

1,024 

— 32.768 

160 

+ 

24 

576 

*.>.824 

104 

— 

3 * 

1,024 

— 32,768 

l6l 

+ 

25 

625 

15.623 

107 

— 

29 

841 

- 24.389 

163 

+ 

27 

729 

19,683 

“4 

— 

22 

484 

— 10,648 

163 

+ 

27 

729 

19,683 

IX 4 

— 

22 

484 

— 10,648 

166 

+ 

30 

900 

27,000 

119 

— 

17 

289 

- 4.913 

168 

+ 

32 

1,024 

32,768 

121 

— 

15 

223 

~ 3.375 

169 

+ 

33 

1,089 

35,937 

123 

— 

II 

12 1 

- I. 33 I 

172 

+ 

36 

1,296 

46,656 

128 

— 

8 

64 

- 512 

173 

+ 

% 

1,369 

50,653 

128 

— 

8 

64 

- 512 

174 

+ 

1,444 

54.872 

131 

— 

3 

25 

- 125 

183 

+ 

47 

2,209 

103,823 

13 * 

— 

4 

l6 

64 

197 

+ 

61 

3,721 

226,981 

135 

— 

1 

I 

— 1 

202 

+ 

66 

4,356 

287.496 

135 


w 

T 








& 

L 


nn 


629 

22,795 

988.637 







mm 


*3 

— 

632 

25,982 

- 1.256,414 

23 

— 

632 

25,982 

- 1,256,414 






43 

- 

3 

48,777 

- *67.777 


Origin at 136. 

3 = " _ 45' A™ 1 ** 0 136 -“- * 35-93 

i?zp m 1083 933 m, — m,'— **=» 1083-929. f” Vmi * 

_ / 267777 ... 

fit* * M — 

# A * 


3**9 


■ 595J "»•*= *•/ — 3 *m g ' + 2 ** = - 


4734 


45 
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3. Observations or the Right Ascension or the 


Seconds 
mi assumed 
meax^ 

X 

Number of 
Observations. 

r 

*y 



» 

**y 

.+ 3*0 

6 

* I 

6 

36 

216 

1,296 

+2-3 

5 

5 

25 

125 

623 

3.123 

+ 2*0 

4 

16 

64 

256 

1,024 

4,096 

+ 1*5 

3 

38 

114 

342 

1,026 

3.078 

+ 1*0 

2 

63 

126 

25a 

504 

1,008 

+o -3 

1 

E 

78 

72 

72 

72 

0*0 

0 

— 

0 

— — 

0 

— 0*5 

— 1 

73 

- 73 

73 

1 Sd 

73 

— 1*0 

— 2 

61 

— 122 

244 

976 

“i *5 

~3 

36 

— 108 

324 

— 972 

2,916 

— 2*0 

“4 

21 

- 84 

336 

“ 1.344 

5.376 

“• 2*5 

“5 

12 

— 60 

300 

— 1,500 

7 . 3 oo 

- 3 *o 

. —6 

6 

- 36 

216 

— 1,296 

7.776 

“3 3 

-7 

I 

- 7 

49 

~ 343 

2,401 



487 

+407 

2,625 

3.467 

39.693 


- 83 -2,549 

=S— *170 

5*390— *029= 5361. 2-3 

m,«- 5*234— 3 (“ *170) X 5*390+ 2(— -170)**=- 2-49 « — — •* 

m 4 = 81*505— 4 (—• 170) (— 5- 234)+ 6 (-170) *(5- 390) — 3 (170) 4 =- 78-88 *,= 2*7 

These observations have been frequently used in discussing 
how far physical observations can be expressed by the normal 
curve. The results are nearly symmetrical, but since k % < 3 
there is an under-concentration near the average. 

4. The following example shows how a table of chances can 
be treated as a frequency group ; an unsymmetrical case has 
been selected, namely the chance of obtaining sixes in a throw 
of 12 dice ; e.g ., the chance of exactly 3 sixes is 




see p. 262. 


Number of 
Sixes. 


Chance in is throws. 

y 


0 


• 



244,140,625 -j- 6 l * 
585.937.500 


1 


• 




2 





644.531.250 


3 





429.687,500 

X nm 2 

4 





193.359.375 

m, - if 

5 





61,875,000 „ 

m § « I* 

6 





M.437.500 

m 4 - 8ff 

2 





2,475,000 

9 — 1*29 





309,375 

K — *516 

9 





27,500 

*s - 3*1 

10 


• 



1,650 


II 


• 



60 „ 


12 


• 



I 

2,176,782,336 



• Quetelet^ Lettres sur la ttUorid des probabiliUs , p. 128. 
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5. If digits are selected at random their average may be 
expected to tend to 4*5. The group in the table below shows 
the result of selecting 400 groups of 25 $ each from the last 
digits in 7 figure logarithm tables. The group is somewhat 
unsymmetrical and * a > 3. 


Sum of 25 Digits, divided by 5. 


Different:* 
from 22*5. 

Over 9 
8 to 9 
7 .. 8 
6 „ 7 
5 .. 6 
4 » 5 
3 m 4 
2-3 
I „ 2 
o „ 1 
o ,, — I 

-I „ — 2 

“ 2 „ -3 


3 -4 



Number of 
Cases. 

1 

5 

9 

5 

12 
10 
15 
36 
48 

57 
62 

58 
39 
17 

13 
10 

2 
1 


400 


With origin 23, 

x = --2575 ; average, 22-7425 
w, 8= 8-8662 ; corrected, 8-78 ^ 

<r = 2-964 

”*3 = 13 584 ; * = 522 
w 4 = 274-24 ; corrected, 269-8 
*2 - 3*50 


Mr. Elderton * gives a method of calculating moments spe- 
cially suited for work on an adding and multiplying machine, 
which may be expressed as follows in the notation of this 
chapter. 

Let y v y % ... yt be the frequencies at x = 1, 2 . . . t. 

Write 0 S 1 =y t> 0 s a=^+^-i» • • • o s < =yt +y*-i + • • • +y v 
Also write 

lSf == 0 S a 4* 0 ^ 2 * 1 S 3 — oSj 4- oS, 4* 0 ^ 8 > • • • » 1 St = 0^1 4" 0^2 4- ... 0 St, 
and a S a = jSj 4~ i^2> • • •» 2 ^* :=s 1^1 4" iS a 4* • • • 4™ i^c, and so on. 
0 S t = number of observations = n 
x St = tyt 4“ (< — i)yc-i 4" • • • 4" 1 .^i 5=5 

where x is the average, = tint x \ 


• Frequency Curves and Correlation, pp. 19-23. On p. 23 Mr. Elderton shows 
how to use an origin near the centre, thereby saving numerical work. See 
also Hardy, The Theory oj the Construction o f Tables oj Mortality ,pp. 59 seq 0 
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»S« = (l+2 + . . . + /)>'t + (l + 2+. . .+/— • •+(l+2)yi+>'j 

/(*+l) . (t—l)t . ,1.2 ' 

+ • • • + —yi = - K 4 - m,'), 

where m/, m t ' . . . are* moments about the origin. 

.S« = -{i . 2 + 2 .3 + .. ./ (/+ i)[y,-f- J{i.2 + • . . (^ — i t)yt . i + ••• 

« « 

= g{< (< + r ) (< + 2 ) jy« + (t — i) t (t + i)y t . i + . . . + i • 2 . ay,} 


= ^ (m,' + 3*»*' + 2m!'), and 

4 St = - n (m 4 ' + 6m,' + nm,' + 6m!'). 

24 

Then by the use of equations 5, 7, and 9 we find 
m *^-*St-i(i + *) 

m,= § . ,S, — 3m, (1 + x) — * (r + x) (2 + .*) 

7l 

m 4 = ^j . 4 S t — 2m,(3+2i)— w,(n + i8i+6i*)— 5(i+i)(2+^)(3+i). 

fl 

The quantities jSj, a S ti 3 S*, 4 S t are quickly obtained by 
repeated addition. The process is exhibited sufficiently by 
working out the moments of Example 5 (p. 256) by this method. 

x is measured from the origin 14 ; y x is the number of cases 
at x . Each term in the column 0 S X is obtained by adding the 
terms in the previous column that stand to the left and above 
it ; the column 1 S* is obtained similarly from the column 


0 S* and so on. 

t = 

18 . 

Write 19 




Sum of digits + 5 

X 


s * 
cr* 

c * 
i a * 

c ' 

a 3 * 


Over 315 

18 

1 

I 

1 

I 

I 

30*5 

17 

3 

6 

7 

8 

9 

29 - 5 

16 

9 

15 

22 

30 

39 

285 

15 

5 

20 

42 

72 

ill 

275 

14 

12 

32 

74 

146 

257 

26-5 

13 

10 

42 

116 

262 

5*9 

25*5 

12 

15 

57 

173 

435 

954 

245 

II 

36 

93 

266 

701 

1,655 

23*5 

IO 

48 

141 

407 

1,108 

2,763 

22-5 

9 

57 

198 

605 

1,713 

4,476 

21*5 

8 

62 

260 

865 

2 , 57 8 

7.054 

20*5 

7 

58 

3i8 

1.183 

3.761 

10,815 

19*5 

6 

39 

357 

1.540 

5.301 

16,116 

18-5 

5 

17 

374 

i, 9 M 

7,215 

23 . 33 * 

17*5 

4 

13 

387 

2,301 

9.516 

32.847 

1 6*5 

3 

10 

397 

2,698 

12,214 

45,061 

15-5 

2 

2 

399 

3.097 

* 5 . 3 H 

60,372 

145 • _ 

I 

1 

400 

3.497 

18,808 

79,180 

Totals . # 


400 

■* O^if 

3»497 

-A, 

18,808 

03 As 

79,180 

~ 3^1$ 

285,560 
“ 4811 
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f "lx**" 8 ‘ 7425 Avera « esas 22*7425 
m, b X 18808 — 8-7425 x 9*7425 — 8-866* 


m a = — x 79180— 3 x 8-8662 x 9-7425— 8-7425 x 9*74^5 X 10*7425 ■* 13*584 
f» 4 «B 400 X 28 5 56o ~ 2 x i 3*5 8 4 x 20*485— 8-8662 X 626-95 — 10744*13 — 274-24 



CHAPTER II. 


ALGEBRAIC PROBABILITY AND THE NORMAL CURVE 

OF ERROR. 

Elementary Principles . 

The method and fundamental theorems of algebraic proba- 
bility may be summarised as follows : — 

Suppose that there are N alternative events, any one of 
which is just as likely to take place as any other, and that 
one of them is known to have taken place, but we are in 
complete ignorance which ; further, of the N events suppose 
that M have a special characteristic and the remaining (N — M) 
have not ; then the chance that the event that has happened 

. . M 

has this characteristic is defined as ^ . 

N 

Thus, if one card has been drawn from an ordinary pack of 
52, the chance that it is a heart is = £. Here each of the 
52 events is so far as we know equally likely, and the skill of 
the card manufacturer is directed to make the cards of equal 
weight and with equal friction. We cannot point to any circum- 
stance which tends to give one card rather than another, unless 
the surface friction of an ace is less than that of a king. In 
an ideal system there is nothing to distinguish the circum- 
stances that lead to one of the N events rather than another. 
In the apparatus of fair games of chance this equality is 
definitely aimed at, and consequently such games supply 
illustrations of algebraic probability. 

Let ~ ; q = 1 — p = . q is the chance that the 

characteristic will not be found. If we call the appearance of the 
characteristic a “success," p is the chance of success, q is the 
chance of failure ; the odds in favour are p to q t against q to p. 

. 259 S* 2 
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Multiplication of Chances. 

( 

If p v p t are the chances of success in two independent experi- 
ments, then p x X Pi can be shown as follows to be the chance of a 
double success. 

In one experiment let there be n x equally likely alternative 
events, and in the other Write p t = — , p t = 

n i n » 

By independence here we mean that the result of the first 
experiment has no effect on the second experiment, so that eafch of 
the X n t possible double events is equally likely. 

Of these n x X n t events m x x m x give a double success 

m i X (n, — wj give success and failure 
(*1 -m,) xm, give failure and success 

(«, — m t ) X («j — give double failure. 

Of «,«, equally likely events m x m t give a double success and 
the remainder do not. Hence p the chance of double success 


= mifW * — PiX Pv 

E.g. the chance that two sixes will be thrown by a pair of dice 
v x t — n- 


If, however, the experiments are not independent, but the 
result of the first affects the chances in the second, the formula 
must be modified in the way illustrated by the following 
example. 

If a card is drawn from each of two packs the chance 
of drawing two aces is ^ x -*4 . where p t = ^ = p x . 

But if the second card is drawn from a pack from which 
the first has already been taken, we have the following 
alternatives : — 

There are 52 x 51 possible events. 

If an ace is drawn first, there are 3 aces in the remaining 51. 

4x3 ways give a double success. 4 x 48 give success 
and failure ; 48 x 4 give failure and success, and 48 x 47 give 
double failure. 

The chance of a double success is therefore ^ x = ri T . 

This problem may also be worked out as follows. There 

are M C, = — pairs in the pack. Of these ' 4 C, = are 

1.2 I • 2 
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two aces. Any pair is as likely to be drawn as any other. 
Hence the chance of drawing two aces, whether together or 

consecutively, is —^3 = - 4 ' — . 

sa'-^a 52 - 5 1 

The chance of obtaining 8 hearts and 5 cards of other suits 
•in a hand of 13 cards dealt from 52 is 
i.Q. x 3 6 C g ^ 13 . 12 . 11 . 10 . 9 . 8 . 7 . 6 . 3 9 - 38 • 37 • 36 • 35 (13 0 
saCia 52. 51. 50. 49. 48. 47. 46.45. 44.43. 42. 4I.40(81)(5!) 

l= 105,857.037 = _1_ annro _ = X . 

90,716,222,800 857 pp * ^ * 

for there are 69 C M equally likely hands = N ; there are U C, 
equally likely groups of 8 hearts and aft C 5 equally likely groups 


of 5 from other suits, and M = 1S C 8 x^Cg, where p = 


M 

N* 


Addition of Chances. 

The total 9 can be obtained from the throw of two dice 
from either of the pairs (3, 6) (4, 5) (5, 4) (6, 3) ; that is of 
36 equally probable events 4 give the result, and the chance is 
therefore =* 1- 

This result may also be obtained thus: the chance of throwing 
3 is £, of throwing 6 is L and therefore the chance of throwing 
3 and 6 is ^ . Similarly the chance of throwing (4, 5) (5, 4) 
and (6, 3) is in each case. The whole chance is the sum 
of the chances of these alternative double events. 

Generally if a success can be obtained either from an 
occurrence whose chance is p x followed by one whose chance 
is pf f or from successive occurrences whose chances are 
P%> P2 • • • > then the whole chance of a success is 

P = PiPi+PiPi' + -..- 


Deduction of the Normal Law of Error . 

We can now proceed to a general theorem of great im- 
portance alike in the theory of probability itself and in its 
application to statistics. 

Suppose an experiment ( e.g . throwing dice, drawing a card, 
or choosing a number) to be such that the chance of success is 
always p and of failure q, so that p + q = 1. 

Let the experiment be repeated n times, and consider the 
chance of obtaining r successes and n-r failures. The chance 
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in an order assigned thus — the first r experiments successes and 
the rest 'failures, is 

P x p X to r factors x q X q X to n^r factors— x q *~ r ; 

and the chance in any other assigned order is the same. The 
order may be assigned by choosing any r positions for* 
successes in a series of n experiments, i.e. in n Cr ways. Hence 
the whole chance is n C r • p r q n ~ r . 

The chances of o, 1, 2 . . . n successes are therefore the 
successive terms of the binomial expansion 

1 - (q+p) n =cr+n-<r~ l P+ • • • +nCr-q n ~ r p r + . . . +tqp n - 1 '+p n 

For example, if p = q = i and n = 10 we have 


r 

Sr 

ptqn-r 

Sr^r~ T 

0 

X 


•006,046,617,6 

1 

IO 

2x3’ „ 

•040,310,784,0 

2 

45 

2 *X 3 S » 

• 120,932,352,0 

3 

120 

2 *X 3 7 , f 

•214.990,848,0 

4 

210 

2 4 X 3 6 m 

•250,822,656,0 

5 

252 

2 # X 3 ‘ M 

•200,658,124,8 

6 

210 

2 *X 3 * 

•111,476.736,0 

7 

120 

2 ? X 3 * m 

•042,467,328,0 

8 

45 

2 # X 3 * .. 

•010,616,832,0 

9 

IO 

2 , X 3 1 » 

•001,572,864,0 

IO 

1 

2“ 

•000,104,857,6 




1-000,000,000,0 



The Vertical scale is expanded 100 fold so that the area of the figure 
is 100 squares on unit base. 
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The diagram illustrates the relative chances ol cjifferent 
numbers of successes, and exhibits them as a frequency group. 

W£ will first find the moments of the group for general 
values of p and n. Take the horizontal scale on the diagram 
as the scale for x. 

Suppose the n-fold experiment repeated N times, where N is a 
very large number. Then the number of times r successes are 
obtained tends to be N x n C r . q n ~ r p r =y r , say, 

and # y 0 +y t + . . . +y n = N (q + p) n = N, since p + q = i, 

x — mi, the first moment about the origin, 

= (y 0 x o +yi x i + . . . +y r x f "f yn x w) -r* N 

= n .q n ~ 1 p-\-n(n—i)/2 2 + . . .+„C r q n ~ r p r Xr+ . . . +p n Xn 

= np{q + p) n ~ x = np (n) 

m,' = (>-o x o* +^1 x 1* + . . . + y r x r* + . . . +y n x»*)tN 

= 2 "'* nC r . q n ~ r p r = 2 k (r -1) +r} r q n ~ r p r 

= n(n- 1) 1 \ ~ q n ~ T P r ~* + ^ n ' r P r ~ 1 

*= n{n — i )p 2 (q + P) n ~* + np(q + p) n ~ x = n(n — i )p 2 + np 

— n*p 2 + np(i — p)=x* + npq (12) 

and m v the second moment about the average, 

= mi — x* = npq = np{ I — p) (13) 

In a similar way 

mi = r* . „C r . q n ~ r p r = n(n—i) (n — 2)/>*-f-3»(« — i)p*+np . (14) 

0 

and ttt 8 , the third moment about the average, 

= 1*8' — 3™** + 2X 9 

-= n(n— 1) (»— 2)p*+3n(n— i)p 2 +np — 3n*p z — ^n 2 p l (i — p)+2n*p* 
= rip(2p*-3p + i)=np(i—p)(i-2p) = npq(q-p) . . (15) 

m 4 ' = ^ r A . u C r . <p~ r p r , and m 4 can be shown to equal 
0 

3 (Pqn)* + pqn (1 — 6pq) . 
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Hen6e, using the formulae of pp. 251-2, <r = Vpqn t , 

pqn 

n = /? = 3 + c — V 2 pqn, k = - jZA , i = 1 » 

* * Pqn ™ Vpqn 4pqn 

The standard deviation varies as Vtt. * and V/^, measure- 
ments of skewness, are small when Vn is great. (* a — 3) and % are 
small when n is great. 

Next consider the chance of r successes and the shape 
assumed by the diagram when n is increased. 

c 

Case I., when /> = q = J and n is even = 2 n'. 

Let P* be the chance of n' + % successes, and therefore n f — x 
failures. 


P - r 1 1 

L x — 2 n'W+x • • 


(2«')l 


+* * 2 n ' +x * 2 n (w' + *) ! (n' — #) ! *2 2 *' 
(2n # ) ! 1 n' («' — 1) . . . (n' — * + 1) 


n'ln'l * 2 2n> ' (w' + 1) (n' + 2) . . . (»' + *) 

P o = n *)} - - t = -7= , by Wallis's Theorem, correct 
0 2 2n . n I n I Vnti 

to — , (Appendix, Note 1 (132)). 

P . 

■■ ‘ =v “" (■+*)(« + j)-(-s.) 

log (P,V^')=log(i-^)- lo g (i+y+log (1— log (1+^-,)+ .. . 


+ log(i-J, 

)-log(x + J)-log( 


— 2 (b + w t+ " 


)... 

- 2 ( 

J + ^r,+ ...)- log | 

H 

1 




1 + 2 + • • . + * 2 I 8 + 2 8 4-...4-x» 

n' 3 ‘ »' 8 

2 I «+l + 2 2, + 1 + ... +X*‘+ l 


2*4-1 


n' a<+1 
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where t is any integer, 

*(y + i) 2 x* (x 4- 1) 1 
” »' ♦ 3 4 »'* 

2 *»'+» 


2 t+I ' (2/+2)»'»*+l 

Write x — rV n' = t c, since from p. 252 




c % — 2pqti = 2 . J . 1 . 2n' = n'. 


log (P*.cvV) = — T* 


Vn' 3 »' 4 


^( T + W)‘-- 


(2f+l) (2<+2) ( »'» +,- ‘) ••‘ + (vS > + 2»' + '") 


= — t 2 + terms involving 


y/n" 


Hence if - 4 -=. is neglected, as in the value of P 0 above, 
V n' 


p ‘=^' r ‘= 


X* 

I 


1 


since <r, the standard deviations cIV 2 . 


. . . (16) 


A J O H TY V2 ” 

And since c 2 — P* = —i=e 

2 VVn 


2x* 


Case II., when p and q are unequal. 

Let P» be the chance of pn + x successes,! and therefore 
qn — x failures. 

= »j bvn<frn g»(gn-i)...(gn-*+i) r 

(pn ) ! (qn) V T (pn + I) pn + 2) ...{pn + x) ' q* 


( I+ s,)( I+ s^---( I+ s i) 


# Appendix 2, formula (133). 

f It is assumed for simplicity in the sequel that pn is integral and there- 
fore P 0 the greatest term; since n is large and powers of ^ are finally 
neglected, the proof is not affected. 
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log owpj - Slog (i - £)-§ k* (i +£)-«•«(. - i) 

« _ vY s 4- i-^1 _ v 1 1 T-iL _ _ 

"lV />»' ",2 V?*»* />*»*/ *** 

B _ *(* + T ) P + 9 _ * (* 4 - 1) (a* 4 - 1) />* — g* 

2 ' pqn 6 ‘ 2p i q i n 3 

X*(x+ i) 2 /> 8 4~ g* 

4 '3/>V«* 

i * <+, +... />'±g‘ - /* ** 

r < + I ‘fqW * ■ ‘ + Vgn + 2g *» 8 + ‘ • 


Write * =* tc, where c 2 = 2pqn — 2 <t 2 
log (P./PJ _ _ *1+2? + 2 2^±|^i±r«,g _ p) 


r 4 c* + 2 t*C* + r*C* 

o» 


• (i - 3 pq)~... 


(a> + -«) _ i 2rc ^ | 2 j^Pl . 
<U4-I)2*‘c»' p ••• + c* + c« + ---' 

since /> + pi, 

4- ^ {(?-/►) - y (i - 3/>g) + 2/>*} 

+ terms involving ^ 


Regard t as finite ; that is, consider only those values of % which 
are comparable with V pqn. 

If we neglect ~ , (that is, if we neglect we have 

X* 

P.= Po«- T, = Po«~ r * 


( 17 ) 
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If we keep neglecting ^ (that is, neglecting we haye 


• P,= P 0 «- T ’.« 


•7(1-p)d-b*) 


. = P 0 «- T *{i — - — ^ (t— |t*)}, since ^ is neglected, 

c c 

• « is > 

since c *» V 2 . a *= V2 Pqn, and * «= ~ ~ ® 

cr 

The value of P 0 may be obtained from Stirling's theorem for 
factorials (Appendix, Note 3 (134)), viz.: tn I = m m V 2 irm . c m+12wi , 
when — 2 is neglected, and = tn m V 2 nm e~ m , when - is neglected. 


* P * “(#«)! («*») 


’ V 27r/>n . 2nqn 


. ^-n+jm+gn ppngq* t 

neglecting ^ &c.. 


*= - 7 - ----- , since p + q — 1, *= — 7=- = — 7— . 

V 2 i Tpqn r \ cvir <tV27t 

Now write y for P*, and we obtain the equations 

1 ?*_ 1 _ X 1 1 £* 

y = -~7= ^= = = * 2p«n = — C * = 7=L 2a> . . (io) 

V 2 t rpqn cV 7T CTV27T 

when is neglected and 
Vn 

1 .£?r k{x x?\} 

'■s' "{'-st-s?)/ t“) 

when *—= is retained and - neglected 
Vn w 

These equations express the chances that when an w-fold 
experiment is made, as described above, the number of 
successes shalj be x in excess of pn, where p is the chance 
of success in a single experiment. 
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_** 

The curve represented by y = is called the 

“ normal curve of error.” * Its shape is shown in the diagram 
at the end of the book. 


An idea can be obtained of the importance of the term 

in by taking « = iooo, p = fa. Then a = V90 = 9*5 and 

k ■■ *084 approx. The chance is sensibly affected when x is 
greater than a. 

When » is great the actual chance of one assigned mirnber 
of successes is small, e.g. if p = n — 1000, the chance of 
exactly 500 (the most probable number of) successes is only 
fa approx. The measurement that we find useful, however, is 
not that of particular ordinates, but of the sum of the chances 
over a range of values, say from x x to x 2 , where x t —x x is 
of the same order as <x(= Vpqn )• 

By a well-known theorem f we can pass from summation of 
the ordinates to integration of an area, and the whole chance of 
a number of successes as great as pn + x x and not greater than 

- ** 

pn + x. is ydx, where y = — 7= c ^ and terms involving 

J*i a\2it 

are neglected. 

x f** i r*« 

Writing z for -, we have J ydx — J r 1 ' 1 dz, and a 

table suitable for evaluating this function is given on p. 271. 

In the following paragraph important constants connected 
with the function in question are obtained. 

Area of curve = I -j— «-*•’<& = limit of (p + q) n when n 

J - m V2T 

tends to infinity = x. 


( e-^'dz — V2w; [ . du — V1 t\ 

« — 00 J — m 




rV 2 


~6 2<r% dx = 


• See Edgeworth, Encyc. Brit. Vol. XXII., article Probability , pp. 391 seq. 

t Appendix, Note 4. 4 
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Write m$ — I — 7= e *"* . x’dx, for the s* h moment about the 

J - » O' V 2 V 

B 

average, which is the origin, the area being 1. 

The curve is symmetrical about the ordinate through the origin, 
and m*+i = o for all values of t* 

t r°° 

m, = — y=z I x*e 2<ri dx 
o-y 2 ttJ - » 

r £^_~| o* . ce> 

= [- **' *'* + -£=f e'^'dx 

• L V2ir J - 00 V 27ry - » 

= O *f (f* = <r* • (2l) 

as was already known from formula (13). 

0 , ^ 

Pint = — 7= [ x^e 2<r *dx 

c r V 2 ttJ - » 

-[■ ?=x*-'e'^T 4- — ^ r xV-'e'^'dx 

L V27T (TV27T J 

= O + (2/ — (22) 

Hence m 4 — 3 or 2 . m % = 3 or 4 = 3w a 2 , and 

"» = 0* = £H = 3. * = &- 3 = o, 

as may also be obtained from p. 264, when n is infinite. 

= (2/ — 1) (2* — 3) ... 3 . icr 24 , by induction, 

-Inr* <«) 


£.g. = i5cr«, m 8 = 105^. 

tj, the mean deviation (see p. hi), since the area is unity, 


2 

crViir 


/•*■*"** = [- 



• (24) 


and 



1 r 00 ~2^ r°* 

* For m* +l *■ ~ ^= =. J ** +1 0 dx ** J <p(x) dx t say, 

- + f* J>(x)dx=* <f>(x)dx-j' fWdx ', where o. 
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The " probable error ” (see p. 113) is obtained by finding 

* rl j 

from the table the value of 2 which makes — 7=,^'*** dz = 

«J0 V27T € 

This value has been calculated as x = z<r = *6744900*. 

A drawing of the curve is given at the end of the book. The 
points of inflection are obtained by equating b*y to zero, where 


Thus 



logy + const. = — 


2 <r* 


1 r^v * 

- D x y = * 


— ~ at the points of inflection 


and * = ± <r (25) 

The area of that part of the curve which stands on the 
base o to or is, of course, the tabular value of 


r\/ 21 


J Y‘'' ix -vr» - F(I) - - 3413 : 


and by a similar use of the table we readily find the following 
approximate values : — 


Proportion of Area of Curve Standing on Certain Bases. 


Base. 

Area. 

Ba*e. 

Area 

O— *2<T 

•07926 

— *2<r to -j" **•’ 

•1585 

O— -6 <r 

*22 57 

-f *2 <r 

„ 4 - 6 * 

•1465 

0— i*o«r 

*3413 

*6 <r 

„ 4“ l*0<r 

•1156 

0— I*4<r 

•4192 

-f i*o<r 

„ + 1 - 4 ^ 

.0779 

O— I*8<r 

•4641 

+ i’ 4 * 

„ 4* i*8<r 

•0449 

0—2*2 <r 

•486I 

-f l*8<r 

„ 4* 2*2<r 

•0220 

O— 2*6<r 

*4953 

+ 2*2 a 

„ 4“2*6<r 

•0092 

O— 3*Oor 

:49865 

4-2*6<r 

„ 4 * 3 * 0 * 

OO33 

Note. — T he 

mean deviation and probable 

error are defined in Part I. pp. 


in- 3 - 

The mean deviation is the average without regard to sign of the differences 
between the measurements of the items which make the group and a central 
measurement (generally the arithmetic average). 

The probable error is the distance which measured left, and right from a 
central position includes exactly half the observations. 
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Table of Values • of F(i) 

V 2 wJo > 


9 

/(*) 

9 

F<*) 

9 

F(*) 

. 

F(.) 

• 

F(«) 

•od* 

•0000 

•50 

•1915 

I- OO 

•3413 

1-50 

*4332 

2-00 

•4772 

•01 

•0040 

• 5 i 

•1950 

I-OI 

•3438 

1*51 

•4345 

2-02 

• 47»3 

•02 

•0080 

*52 

•1985 

1-02 

•3461 

1-52 

•4357 

2-04 

•4793 

-03 

•0120 

*53 

•2019 

103 

•3485 

i *53 

• 437 ° 

2*06 

•4803 

•04 

•0160 

*54 

* 2 °54 

I 04 

•3508 

*•54 

•4382 

2*08 

-4812 

05 

•0199 

*55 

•2088 

I 05 

*3531 

i *55 

•4394 

2-10 

•4821 

•06 

•0239 

-56 

•2123 

1-06 

*3554 

1-56 

•4406 

2-12 

•4830 


•0279 

*57 

•2157 

1-07 

•3577 

I- 5 Z 

•4418 

214 

•4838 

•08 

•0319 

•58 

•2190 

108 

*3599 

1.58 

•4429 

2-l6 

•4846 

•09 

•0359 

•59 

•2224 

1-09 

•3621 

1-59 

•4441 

2-l8 

•4854 

•10 

•0398 

•60 

•2257 

I-IO 

•3643 

I* 60 

*4452 

2-20 

•4861 

•II 

•0438 • 

•61 

•2291 

I-II 

•3665 

l-6l 

•4463 

2*22 

•4868 

•12 

•0478 

•62 

•2324 

1*12 

.3686 

162 

•4474 

2*24 

•4875 

•13 

-0517 

•63 

•2357 

*•*3 

•3708 

163 

•4484 

2-26 

•4881 

•M 

•0557 

•64 

•2389 

i-i 4 

•3729 

1-64 

•4495 

2-28 

•4887 

•15 

•0596 

•65 

•2422 

11 5 

•3749 

165 

•4505 

*• 3 ° 

•4893 

•I6 

•0636 

•66 

•2454 

l-i6 

•3770 

1-66 

•4515 

2-32 

•4898 

‘ l l 

•0673 

•67 

•2486 

*-*z 

■3790 

1-67 

•4525 

2*34 

•4904 

•l8 

•0714 

•68 

•2517 

118 

•3810 

1-68 

•4535 

2- 36 

•4909 

19 

•0753 

•69 

•2549 

I-I 9 

•3830 

1-69 

•4545 

238 

•4913 

-20 

•0793 

•70 

•2580 

1-20 

•3849 

170 

•4554 

2- 40 

•4918 

•21 

•0832 

•71 

•2611 

1*21 

•3869 

171 

•4564 

2-42 

•4922 

•22 

•0871 

•72 

•2642 

1-22 

•3888 

1-72 

*4573 

2-44 

•4927 

*23 

•0910 

•73 

•2673 

123 

•3907 

1*73 

•4582 

2-46 

•4931 

♦24 . 

, -0948 

*74 

•2703 

1-24 

•3925 

1*74 

*4591 

2-48 

•4934 

25 

•0987 

*75 

•2734 

1-25 

•3944 

1*75 

*4599 

250 

•4938 

•26 

•1026 

•76 

•2764 

1-26 

•3962 

1-76 

•4608 

2-52 

•4941 


•1064 

*77 

•2794 

1-27 

•3980 

1-77 

•4616 

2-54 

•4945 

•25 

•1103 

•78 

•2823 

1*28 

•3997 

1-78 

4625 

256 

-4948 

•29 

•1141 

*79 

•2852 

1-29 

•4015 

1*79 

•4633 

2-58 

*4951 

•30 

•1179 

•80 

•2881 

1-30 

•4032 

i-8o 

•4641 

2-60 

•4953 

*31 

•1217 

•81 

•2910 

1*31 

* 4°49 

i-8i 

•4649 

2-62 

•4956 

♦32 

•1255 

-82 

•2939 

132 

•4066 

1-82 

•4656 

2-64 

*4959 

*33 

•1293 

•83 

•2967 

i -33 

•4082 

1*83 

•4664 

2*66 

•4961 

•34 

•1331 

•*4 

•2995 

1*34 

•4099 

1-84 

•4671 

2-68 

•4963 

*35 

*1368 

•85 

•3023 

1*35 

•4115 

185 

•4678 

2* 70 

•4965 

•36 

-1406 

-86 

*3051 

1-36 

•4131 

1-86 

•4686 

2*72 

.4967 

*37 

*M 43 

* 8 Z 

•3078 

x *37 

• 4 M 7 

1-87 

•4693 

2-74 

•4969 

• 3 & 

•1480 

•88 

•3106 

1*38 

•4162 

1-88 

•4699 

2-76 

•4971 

•39 

•1517 

•89 

•3133 

1*39 

•4177 

1-89 

•4706 

2*78 

*4973 

•40 

•1554 

*90 

•3159 

1-40 

•4192 

1-90 

•4713 

2-8o 

*4974 

• 4 i 

•1591 

•91 

•3186 

i* 4 i 

•4207 

1*91 

•4719 

282 

•4976 

*42 

•1628 

•92 

•3212 

1-42 

•4222 

1-92 

• 47*6 

2-84 

•4977 

*43 

•1664 

•93 

•3*38 

1*43 

•4236 

1*93 

•4732 

2-86 

•4979 

•44 

*1700 

•94 

•3264 

i -44 

•4251 

*•94 

•4738 

2-88 

.4980 

*45 

•1736 

•95 

•3289 

1*45 

•4265 

1*95 

•4744 

2-90 

•4981 

•46 

•1772 

•96 

•3315 

1-46 

•4279 

1-96 

•4750 

2-92 

•4982 

*47 

•1808 

•97 

•3340 

x# 47 

•4292 

1-97 

•4756 

2-94 

*4984 

•48 

•1844 

•98 

•3365 

148 

*4306 

1-98 

•4761 

2*96 

•4985 

*49 

•1879 

•99 

•3389 

x -49 

•4319 

*•99 

•4767 

3.98 

•4986 


M F (M) • F(m) 9 F(«) 

3*00 -49^65 360 -499841 4-50 *499997 

3-ao -49931 380 -499928 

3-40 -49966 4-00 -499968 

• Based on Dr. Sheppard's 7 figure Tables, Biometrika , Vol. II, Part II. 
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It has been calculated that F (z) = J when g = "67449 
approx. The quartiles of the curve are therefore at 

± "674490- (26) 

and it is just as likely as not that a single observation will 
be within this range as without it. -674490- is therefore the 
“ probable error ” and is frequently used in preference to a 
to measure precision. 


Algebraic Chance and Experience. 

The analysis so far has been purely abstract, the illustra- 
tions from cards and dice only having been taken to visualise 
the phrase " equally likely.” We must now consider what 
evidence there is that successes do occur in proportion to their 
algebraic probability. Though we should certainly be sur- 
prised if, in simple cases, successes were in a different propor- 
tion — if, for example, we found that 90 out of 100 coins tossed 
fell head uppermost, or 50 repeated draws of one card from a 
complete pack (shuffled after replacing each card drawn) 
were all hearts — yet this feeling hardly gives more than a 
presumption that in the universe there is some method in 
apparently chance events. We must appeal to experience and 
experiment. In a general way, it is the experience of players 
of games of chance that events do happen at any rate roughly 
in proportion to their algebraic probabilities ; canons of correct 
play in whist were based on this, and the odds were given in 
accordance with calculated probability. Insurance, both 
accident and life, is based on the belief that events in the bulk 
are predictable, though individual occurrences appear to be 
fortuitous, and this belief has been continually justified. A 
great number of experiments have been carried out directly 
for the purpose of comparing the frequency of the occurrence 
of events with their a priori chances, with very marked success. 
We can never, however, obtain a certainty that the preliminary 
condition of equal probability is satisfied completely, nor can 
we expect to obtain more than an approximate verification. 

Rough experiments can easily be made by quite simple 
means. 

Thus from numerous packs of cards, from which the picture 
cards had been removed, 4 cards were drawn and the total of 
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the pips on them counted and then the cards replaced. , This 
was done 90 times. 

The* chance of getting a total of r pips, if the number of 
paclcs was so large that the draws of the separate cards in the 
quartets could be taken as independent, is the coefficient of 

*” in (* + x* + . . . + x 10 )*, i.e. in ~ . ( ~~ )*- and 

be tabulated with the results of the experiment as follows : — 


.4 

to 9 

Aggregate 

chance. 

•0126 

X 90* “ Expectation." 

1*134 

Experimental 

result. 

O 

10 


•0871 

7‘839 

7 

15 

19 

•2 375 

21*375 

27 

20 

2 4 

•3256 

29-304 

25 

25 

„ 29 

•2375 

21-375 

22 

30 

» 34 

•0871 

7* 8 39 

9 

35 

.. 40 

•0126 

I*i 34 

0 



1-0000 

90-000 

90 


The total of all the pips in the 90 quartets was 1956, and 
the average per card 5*43. The average on all the cards in 
the packs was 5*5. 

I* is evidence that the experiment corresponds with the 
expectation, approximately at any rate. 

Bernoulli s Laws . 

We must next inquire what correspondence between theo- 
retical and expected frequency the theory itself leads us to 
expect. The Law of Error supplies a test. 

Consider the group r = 15 to 19 in the above experiment. 
The chance of finding a number in this range is *2375 — p. 
In 90 experiments the chance 01 finding a number in this 
range t times is the / + I th term of (q + p ) 90 . The most 
likely number of successes is 21 or 22 and the standard devia- 
tion of the possible number of successes is Vpqn where 
n = 90, i.e., about 4. In such a multiple experiment many 
times repeated, the chance of getting anything from 17 to 26 
successes in the group is found from the Table to be about 
§ ; that we should obtain so great a number as 27 (as in 
the experiment tabulated) the chance is about J. It is very 
unlikely that we should have a divergence from 21 by as much 
as 3 times the standard deviation ; that is, more than 33 or 
less than 9 occurrences are very improbable. 

. T* '* 
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Thjs process, stated more generally, leads to Bernoulli’s 
Laws, which may be paraphrased as follows. If an experiment, 
in which the chance of success is p t in performed fi times, 
and p’n is written for the number of successes, then as n is 
increased p r tends to approach p . The chance of the occurrence 


of a deviation greater than p ~ p\ is 2 

p~f 


1: 


V2 1 


e~ i,i dz, 


where 




-PY 


and hence as Vn increases the chance of any assigned devia- 
tion diminishes. By increasing n sufficiently the chance can 
be made as small as we please.* 

Now it is the result of general experience and many experi- 
ments that Bernoulli's Laws can be realised in fact. 

If, then, we can obtain the condition of a priori equally likely 
occurrences, we may calculate the chances of various events 
by the methods of mathematical probability, and expect that 
our calculations will be realised in fact within a margin deter- 
minable by the law of error. 

On the following pages the results of various experiments 
are shown. The first three compare the distribution found 
with that given by the law of error, and the remainder show 
the working method of determining the size of a class in a 
large group by the method of sampling. 


Examples. 

I. If a digit is taken at random the chance that it will be less 
than 5 (o, 1, 2, 3 or 4) is £. The digits in the 7 th decimal place of 
a book of logarithms were taken 50 at a time and the number (r) 
of digits less than 5 was noted. The chance of finding r such 
digits is the r + i tb term in the expansion of (} + £) 50 . n = 50, 
p — q — i> y/pqn=* 3*535 = o'- 

pn , the most probable number, is 25. The chance of not exceed- 
ing 25 + x is F(ar) in the table, p. 271 , where z = - = 3-535’ ^ we 


* Notice that p~p' is the deviation of the proportions, 
actual deviation is pn~p'n, and * should then be written 

pn ~ p'n 

V/>(i - p) »• , 

and the chance incr$ase$ as V n increases. 


The resulting 
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assume that » = 50 is large enough in a symmetrical curve to 
allow the use of the normal curve instead of the binomial series. 
The # 5o-fold experiment was performed 300 times. 


r 

» 

F(s) 

* 3*5 

— 3.2522 

*4994 

14*5 

-29694 

•4986 

15*5 

— 2*6866 

•4966 

16*5 

— 2*4038 

•4919 

17*5 

— 2*1210 

• 4 8 3 « 

18*5 

— 1*8382 

•4670 

19-5 

- 1*5554 

•4400 

205 . 

— 1*2726 

•3984 

21*5 

- *9*98 

•3389 

225 

— *7070 

•2602 

23*5 

— .4242 

•1643 

245 

- 1414 

•0561' 

25*5 

+ * I ' 4 I 4 

•0561. 

26-5 

+ *424^ 

•1643 

27*5 

-f *7070 

•2602 

28*5 

4 - *9898 

•3389 

29*3 

1*2726 

•3984 

30*5 

1*5554 

•44OO 

31-5 

1*8382 

•4670 

32-5 

2*1210 

•4831 

335 

2*4038 

•4919 

34*5 

2*6866 

•4966 

35 * 5 * 

2-9694 

•4986 

36*5 

3*2522 

*4994 


Differences * 

X 300=3 

/ Expected f Actual 
\ number of occurrences. 



•0008 

•2 

O 

1 

at 

14 

•0020 

•6 

0 or i 

0 

99 

15 

•OO47 

1*4 

I or 2 

3 

#« 

16 

•0088 

2*6 

2 or 3 

2 

99 

17 

•Ol6l 

4*8 

5 

3 

99 

18 

•0270 

81 

8 

7 

99 

19 

•0416 

125 

12 or 13 

9 

99 

20 

*0595 

1785 

18 

18 

99 

21 

•0787 

236 

24 

26 

99 

22 

•095Q 

288 

29 

21 

99 

23 

*1082 

32-5 

32 or 33 

32 

99 

24 

•1122 

33*7 

34 

42 

91 

25 

•1082 

32-5 

32 or 33 

36 

99 

26 

*0959 

28*8 

29 

30 

99 

27 

•0787 

23*6 

24 

28 

99 

28 

•0595 

17*85 

18 

15 

99 

29 

•0416 

125 

12 or 13 

16 

99 

30 

•027O 

8*x 

8 

5 

99 

31 

•Ol6l 

4*8 

5 

2 

99 

32 

•0088 

2*6 

2 or 3 

2 

91 

33 

•OO47 

1*4 

1 or 2 

1 

99 

34 

•0020 

•6 

0 or 1 

1 

91 

35 

•OOO8 

•2 

0 

0 

99 

36 


299- 6 


The agreement is as close as the theory leads us to 
expect (see Chapter X). The standard deviation a priori is 
Vpqn — 3*535* We can also find the standard deviation of 
the observations a posteriori by taking the square root of the 
second moment as on p. 253. The average is 25*043. The 
second moment of the observations about an origin at 25 is 
(1 x ii 2 + o x io 2 + 3 x 9 2 + . . . + 1 x io 2 ) -r 300 = 11*30, 
and about the average is 11*300 — *043* = 11*298. The square 
root is 3*361, which differs from the a priori value by *174, 
which is a not improbable deviation (see formula (120) below). 

2. Instead of finding the expectation at each value, we can 
test the distribution by the method illustrated in the following 
example. 

In a book, in which a page contained 37 lines, it was counted 
on each of 100 pages in how many cases the first (complete) 


• Thus when r » 13*5 and 14-5, F(*) « .4994 and *4986. The difference 
*0008, x 300, is the expected number at r = 14. 

t Nearest whol^ numbers from the previous column. 


T* 2. 
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word ip a line contained 1, 2, or 3 letters. In 3700 lines such 
first words occurred 1317 times. The chance, then, that a 
first word contained 3 letters or less was» 

p = mi,q = i-^ = m 

The chance of findin g r such first words in a page was 
approximately the r + I th term in ( q + p)* 1 . 

The a priori standard deviation is V-pqn — 2*913 = a. 

The occurrences were as follows. 


Number of first words Number of pages ou which 

of 3 letters or less. these occurred. 


7 

I 

8 

2 

9 

9 

10 

6 

11 

8 

12 

17 

13 

15 

14 

12 

15 

13 

16 

5 

\l 

4 

2 

19 

3 

20 

2 

21 

0 

22 

1 


Average 13*17 = x ; standard deviation calculated from the 
observations 2-90. Now calculate the number of cases to be 
expected in grades each of o* measured from the average. 


X-3 4*47 

F (- 

3 ) = *499 

*- 2 <r- 7*37 

F - 

2) = *477 

X—a mm 10*27 

F (- 

1) =as ’ 34 I 

* - 1317 

F (0) 

= •0 

*+ 9 mm 16*07 

F 1) 

«. 34 i 

*4 -2<r*= 1897 

F 2) 

•=*477 

f +3*™ 21*87 

F( 3 ) 

-*499 


Difference 
X 100. 

2*3 

13-6 

34*1 

34*1 

13*6 

2*3 


Occurrences. 


Under 71 r 

7$ to 10 r 

ioi „ 13 
13J m 161- 
16} „ 191 
19 } »» 22 !f 


I 

17 

40 

30 

9 

3 


100-0 


IOO 


In observations where the measurements are necessarily 
integral, it is not easy to adjust the grades to multiples of <x. 
But where the observational grades are narrow, or the measure- 
ments continuous, this method (proceeding by equal sub- 
multiples of a) is rapid, and since the grading can be decided 
before the test is applied, affords a good and simple test. 

3. A similar experiment was made with a list of firms, in 
which there were 74 pages containing about 40 names each. Each 
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firm had been marked for administrative purposes if it employed 
a certain number of women. One-fifth of all the firms were so 
marked. On any page the chance of finding r firms was there- 
for& the r + I th term in (q + p) i0 where 

P = i, =* V(£ • t • 4°) = 2-53, pn = 8. 


Between pn-\- 2<r and pn + 3* 



Expected. 

i -7 

Actual. 

2 or 3 

pp 

+ *<r 


-f- iff 



3*3 < 

5. 6 or 7 

pp 

+ <r 


~H<r 



6-8 

4 or 5 

PP 

+ i* 


-j-<r 

P 


no 

9, 10, n 

p\ 

0 


+ i« 

<r • 


14-2 

13 or 14 




+ 0 

4 • 


142 

15 or 16 

f| 

— a 


— 1 <r 



no 

8 

pp 



— ff 



6-8 

7 

PP 

— 'Iff 


-S' 



3*3 

2 

PP 

- 3 * 


— 2<r 



x *7 

5 


The alternatives in the final column are due to the difficulty 
of adjusting the entries to the predetermined grades. 

In this case the preliminary condition of independence is 
not completely fulfilled ; the chance of finding a marked name 
should not be affected by the presence or absence of marked 
names on the same page ; but in fact in some cases the name of 
a firm was repeated for each of its branches, and all the branches 
did or all did not employ women. 


Application to Sampling . 

One of the principal uses of the theorem relating to the 
number of successes to be expected in a given number of trials 
is in the examination of a large group by means of samples. 
In its simplest form the method is as follows. 

In a “ universe ” containing N things or persons, />N possess 
a defined attribute, where N is known but p is not known. 

n things are selected at random from the universe, and of 
them p'n are found to possess the attribute. 

If ~ is small,* and if in the process of selection everything 

in the universe has an equal chance of being chosen, and if 
the choice of one thing does not influence the choice of any 
other, then the chance of finding (p + x)n things is given by 


The necessary correction, when ~ is not negligible, is given below, pp. 282-4. 
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where <r 


-V* 


and the table on p. 271 can be 


applied. The precision, measured by increases with Vn. 

It is shown below (p. 417) that in evaluating a, the value 
/>', observed in the sample, can be substituted for the unknown 
true value p. 

The result may be stated thus: the value of p in the universe 

is p' ± ^ ^ the expression meaning that p' is the 

most probable value from the data, and that the' chances of 
variations from p' are given by the Table, p. 271, where the 

standard deviation (the unit in the Table) is ) . 

It is clear that this value can only be applied to the defined 
universe, the members of which have the chance of being 
enumerated. The importance of this and other conditions can 
be best illustrated by an example. 

In Reading 609 working-class houses were visited, and in 
154 of them it was found that there were more than 1 and less 
than 2 inhabitants per room. » = 609, p'n — 154, p' = -253, 
Vp'q'/n — -0176. The proportion of houses thus occupied is 
•253 ± -0176. 

The " universe ” here is the group of houses (about 12,000) 
from which the 609 were selected. This group was determined 
from a local directory, from which middle-class and large houses 
were eliminated by the help of a list of “ principal residents ” 
and by local knowledge, and non-residential houses were 
omitted. The accuracy of the measurement for working-class 
Reading depends on the completeness and accuracy of the 
directory and on the appositeness of the method of elimina- 
tion. If a rookery of slum dwellings had been omitted, by so 
much the universe would have been curtailed ; or if a street 
of middle-class houses had been included the universe would 
have been extended, unless in the process of investigation the 
error had been found. 

In this case the selection was made by marking one house 
in 20 throughout the amended directory. It is shown on 
p. 332 that this gives a more precise result than if a purely 
random method had been followed. A general method of 
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securing randomness is to give numbers from i to N to the 
things in the universe, and by the use of tables of figures or 
otherwise select n nymbers.* Great care must be taken to 
enshre pure randomness or some method which gives a more pre- 
cise result than pure randomness. It was found, for example, 
that in the latitude experiment (p. 281) randomness was not 
obtained by selecting pages and dropping a pencil on names ; 
the entries in a page were not independent of each other. Any 
divergence from the rule that every item must have the same 
chance of inclusion may affect the result disastrously. 

Of course inaccuracy of information (e.g.^as to the number 
of persons resident in a house) is to be avoided ; but if the 
errors due to this source are equally likely to be in excess or 
defect, the result is not much affected. 

It should be noticed that the accuracy of the result depends 
on n the number in the sample, and not on N the number in the 
universe. The size of the universe only affects the problem 
in that, when the N things are numerous and scattered, it is 
difficult to get an accurate enumeration and secure that each 
has^an equal chance of being chosen, and it becomes possible 
that parts are omitted from ignorance of their existence, which 
differ essentially from the major parts included. Further 
when p is small, />N may be moderately large, while pn is 
relatively small. Now if pn is small, the approximation to 
the curve of error (p. 265) tends to break down, and the term 

involving k ^=^^ =y==) * s not negligible ; so that the terms 

of the binomial ( q + p) n should be used instead of the integral 
table. A little examination of numerical cases will show that 
for certain small values of p it is quite possible that no thing 
having the attribute will be found ; thus, if 30 houses in a town 
containing 10,000 houses are overcrowded, and 800 houses 
are examined, the chance of finding no overcrowded house is 
q n p°, where p = *003, q = *997, n = 800, that is *09 ; so that 
a report based on the sample might not contain reference to 
overcrowding, unless to say that there was no evidence of it. 

* For example, if N =» 10,000 and n *= 500, we might take the last four 
digits of pages in 7 figure tables till we had 500 numbers all between 
o and io.ooi, and investigate the things to which these numbers were affixed. 
This method was # used in the experiment on the number of persons in a 
parish. See next page. 
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But it ,p = *03, (fp* is only about and some instances 

would certainly be found. As to the chances of occurrence of 
small numbers, see p. 284 below. 

Finally it should be emphasised that when the things that 
should be included are determined by marking in a list dr 
otherwise, no difficulties of measurement should be allowed 
to stand in the way of their inclusion. If a householder refuses 
information, or part of a consignment of goods is out of the 
way, there is a presumption that the characteristics of the 
house or the goods are not normal, and unless the .difficulty is 
overcome, some part of the universe is not represented. 


Examples of Sampling . 

1. The 12,830 civil parishes enumerated in the Census of 
England and Wales, 1911, were numbered, and 250 selected 
by numbers taken from logarithmic tables. The following 
table compares the distribution of the parishes according to 
their populations in the sample and in the whole group (winch 
is set out in the Census Volume, Cd. 6258, p. 428). 


Number of Persons in Parish. 


Number of parishes in 

Under 

100. 

100 to 

moo. 

sample of 250. 

. 

35 

52 

1000 p ' 

. 

140 

208 

i 

M 

• 

22 

26 

Actual per 1000 

. 

152 

192 


aoo to 

300 to 

400 to 

500 to 

1000 or 

300. 

400. 

5 °°- 

1000. 

more. 

42 

27 

20 

41 

33 

168 

108 

80 

I64 

132 

*4 

20 

17 

23 

21 

M 7 

108 

80 

173 

146 


Here (to take the first column as an example) 35 were found 
in the sample of 250 with population less than 100. 


P' = 


- 35 . 


250 


14. 


The forecast per 1000 parishes is therefore *14 of 1000 = 140. 
The standard deviation of p' is V ^ ' ^50 ^ ~ = " 022 > P‘ 2 7 ^> and ■ 

therefore the standard deviation of 1000 p', i.e. of the forecast 
140, is 22. Actually in England and Wales there were 152 per 
1000 parishes with less than 100 people. The forecast differs 
from the fact by about half the standard deviation. (Statistical 
Journal, 1912-13, p. 182.) 
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2. From a list of the rates of dividends of 3878 companies 
400 were selected and tabulated. 


Rate of Dividend per cent. 



Below £3 

£* 

£i 

*5 

£6 

£* 

Number of companies in sample 34 

108 

117 

60 

48 

33 

1000 p ' .... 

. s 5 

270 

292j 

150 

120 

82J 

/ P'<t 

1000 K . . • 

V 400 

• m 

22 

23 

18 

16 

14 

In full list per 1000 . 

(Statistical Journal , 1906, p. 

• 75 
552.) 

272 

311 

177 

I08 

57 


3/ From a geographical index containing 31,210 names 
500 places were selected and their latitudes tabulated. To 
secure randomness the columns of names were numbered and 
selection made from numbers in mathematical tables ; a foot- 
rule was placed over the column, and the entry against the 
number of inches on the rule determined by the first digit of 
the longitude of the first place in the column was selected. 
This elaborate method was found necessary to secure inde- 
pendence. 

Latitude, North or South. 


• 

Number of 

places in 

0* to 

IO* 

10* to 

* 

20 

ao* to 
3 °* 

3 °* to 

40* 

40* to 
50 * 

50* to 
60* 

60* to 
70* 

70* to 
8o* 

80* to 

QO* 

sample . 

. 

22 

56 

IO4 

103 

93 

1 12 

9 

I 

O 

1000 p' 

. 

44 

1 1 2 

208 

206 

186 

224 

18 

2 

O 

fa! 

i 

M 

• • 

9 

14 

18 

18 

17 

19 

6 

? 

? 

In full list per 

1000 

51 

III 

201 

200 

200 

215 

x8 

34 

09 


Notice that the places north of 8o° N. and south of 8o° S. 
were missed in the accident of the selection. In another 
selection where n = 2000, 1 per 1000 were found in these 
latitudes. 

4. Out of the householders’ schedules of the 1911 Census, 
1 in 50 in order throughout the files were selected in Shoreditch, 
and the personnel of the households classified. 




Occupied 

Persons. 

Unoccupied. 



Males. 

Females. 

- 


Total. 


Over 

Under 

Orer Under 

Over 

Under 



*o year*. 20 year*. 

18 years. 18 years. 

14 year*. 

14 years. 


Number of persons in 







sample . 

538 

1 12 

3io 74 

386 

718 

2138 

1000 p' 

251 

5* 

M 5 35 

l8l 

336 

IOOO 

1000 */£fs • • 

9 

5 

S 4 

8 

IO 

— 

Distribution per *1 000 







from Census tables 

258 

55 

M 4 33 

185 

325 

IOOO 
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Case when the Universe is not practically Unlimited or the 
Selections are not Independent . 

(- 

In the statement of the experiment which leads to the 
normal curve of error (pp. 263 scq.) it was assumed that the 
chance of success for each throw or draw was always the same" 
(p. 261), and that each trial was uninfluenced by what had 
already happened. In practice this condition is seldom 
completely satisfied, but we can prove in a similar manner that 
the normal law of error is obtained under a wider hypothesis.* 

Let a universe contain N objects, of which possess a certain 
quality or attribute, and do not {p + q = 1). Let a selection of 
n be made in such a way that every object in the universe has the 
same chance of being chosen. Write P* for the probability that 
pn + x of the selected objects shall possess the quality in question. 
E.g., if the “ universe” is a box containing 1000 balls of which 
100 are white and the rest coloured, and if the contents are 
thoroughly mixed and 50 selected, then N = 1000, p = ^ (where 
white is the attribute), n — 50, pn = 5, and P* is the probability 
that 5 + x white balls are present in the selection. 

The whole number of different possible selections is H C n . 

The number of selections in which pn + x are white and the 
remainder (qn x) are coloured is Cpn+* X q *C qn - x . 

Hence P, = pMCp,1+it - X * 

i»Cn 

(pN)_l (qN) In! Ml 

(pn + x) I (/>M — x) ! (qn — x) ! (jM + x) ! N !' 

where M = N — n. 


1 1 


Apply Stirling’s theorem to the factorials, neglecting ~ , - and 

pn n 

smaller quantities. (App. formula (134).) 

P # = (^N) p >N) , >) n M“(^») ~ pn (/>M) -*“( 9 «r SB (?M)- 9 *‘N- K 

( 2 ir) t- -I ^0 ( />NgNnM A 
’ [ ) KpnpMqiiqWN ) ' 


the index of e being 

pn + />M + qn-\- + N — />N — — n — M 

= 0, since p + q — i. 


* E.g. the chance of obtaining 3 aces in a hand of 13 dealt from a pack of 

52 is Pg =s 4CJ x — gjCji ; here N » 5^» n ■* 13^ P ** sr » 2. 
P 4 ■« *041 approx. 
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When the indices are collected it is found that 

9 

••• ,27) 

P» = (/>») 1 Q»M) ! (qn) 1 (qM) I 

•P. (pn + x ) ! (/>M — x ) ! {qn — x) 1 (jM + x ) 1 

= (pn) pn ( pM) pu (g n) qn {qM) ,iU . (2ir)° ■ <5° . (p n . pM.qn. g M)* 

{pn + x^+^ipM -x) vu -* + '{qn - x) qn ~* + *(qM + 

P 0 t . *Y ,B +*+t/ x / x \« n_ *+J / x \«“+*+* 

‘‘Pi vt/w' V pM.) V &) V + ?M/ 

log P,/P 0 =—(pn+x+ j) log (i+^-(qn-x+l) log (1-^) 

- (/>M-*+ j)log (i-p)- (qM+x+i)lo S (x+^j 

( pH + * + h ){fn ~ 4'n* + • * •) 

. +(qn-x+ i)(±- + -^_ 4 + ...) 

+ (pU-x + i)(^ + + . . •) 

— (qM + * + i)(^| - + • • .) 

= I. + _1 + 1 _ _L\ X 1 (L a. L 4. jL + ±\ 

2 \ pn ' qn ' />\I qM/ 2 \pn' qn'’ pM' qMj 

+ ... 


+ 


**/ x 


+ „2 .. 2 ~i~ iOiTia + 




4 \/> 2 ;i 2 ^ <? 2 n 2 n p* M* ^ ? 2 MV 

n is of course less than N, and we may take it (without loss of 
generality) as less than £N and therefore less than M. 

Let pn and qn be at least moderately large, so that we proceed 

in ascending powers of — . 

Vpn 

A solution is then obtained if we take — as a quantity com- 

% 1 x 2 

parable with unity, and therefore - as of order -y= and as of 

n yn 

order - , as on»p. 266 above. 

n 
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Then neglecting terms of orders and higher, we have 

Vn 


loeP/P =- x l(t±l + t±l\ ** (>' + M) ^ 

® 9 0 2 \ pqn ' pqNl ) zpqnM. 


2pqnM 


Write <r* for 


PqnM 

~~w~* 




and 


y = P* = 


<rV 2ir 
I 


2a* 


r\^27r 


This is the normal curve of error, and cr (as above shown, 
formula (21)) is its standard deviation. 

<r* = fqn . — ^ = pqn(l - . . . (28) 

and is smaller than its value (pqn ) under the conditions of 
pp. 261-7, but tends to reach it (as it should) when N 
becomes indefinitely great. 


Law of Small Numbers . 


In the deduction of the normal curve from the terms of 
(p + q) n it was assumed that not only n, but also pqn , was 
large. An interesting case arises when p is so small that pn 
is no longer large, q being in that case nearly equal to i. 

Let u = pn, and be a small finite number. 





1 — 


u 
n * 


The chance of r successes in n independent experiments is 
n ! 


Pr — 


)P r 4*- 


(n— ■ r ) ! r r 


Neglect - ; then the product of the r — i factors in brackets, 

which is between i and i — - ^ ~~ may be taken ls i. 

zn J 
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Also (1 — ^ tends to e~ u , and 

<T T = |(i — |" tends to 


f 

and to 1, as - tends to o. 
ft 

In all = (29) 

If f* 

when - , - and — are neglected. 
ft n ft 

o* = pqn = u ^1 — ^ .\ c r =» Vw, approx (30) 


* - - (* - 2 i) I J K 1 - 3 } - ^ a pp r ° x - • (31) 

The whole curve is then determined by w, without separate 
reference to p and ft , since its average is w, its standard 

deviation and its It follows that the values 

of *p and ft are not easily determined separately from 
observations. 

The greatest term of the binomial expansion is 

_ e~"u" * 

rpn 

when u is integral, and then 



f 

and this rapidly becomes small as - passes through integral 

values. E.g. if u = 6, and r = 3 u, P 8U =■ *00004. 

Consequently the observed values never differ greatly from 
their average. Attention has been directed to the agreement 
between the fluctuations of small numbers and the law of 
distribution thus described, and examples have been given by 
Bortkiewicz ( Das Gesetz der hleinen Zahlen, 1898) and 
Mortara ( Annali di Statistica, Serie V, vol. 4, 1912). It is 

• If u is as greJt as io, this differs from ■ / 1 by less than I per cent. 

V 2ifU 
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also interesting to notice that the theory leads to what may be 
called the permanence of small numbers. If among a great 
number of things there are a few which present somte par- 
ticular feature, it is a matter of common experience that this 
small number is seldom much exceeded and seldom entirely 
vanishes ; this experience applies to accidents, fires, the 
traditional “ Derby dog,” and to the rare events and coinci- 
dences with which some newspapers fill their columns. 
Specialists in all professions, from the doctor who treats only 
one obscure disease of the ear to the dealer in curiosities, make 
their livelihood dependent on this permanence of small numbers. 

To take an example : Out of some 530,000 deaths annually 
from all causes the following are the numbers from splenic 
fever in the years 1875 to 1894 : — 

5, 4, 10, 14, 12, 18, 9, 15, 8, 18, 11, 11, 11, 12, 7, 4, 3, 6, 7, 10. 


Average 975 = pn 

= u. e~ 

**= -00005842. 



• u r lr 

Forecast. 

Actual. 

0 ... 

•00006 

X 20 ** -OOI 

O 

i to 4 

•0343 

- *7 

3 

5 »> 9 

•4564 

« 9 -i 

6 

10 „ 14 

■4408 

■=8*8 

8 

15 „ 19 

•0683 

■*i *4 

3 

20 ... 

small 


O 


Note added in 1936. — The result is obtained also by a hypothesis 
independent of p and n separately. Write q(t) for the chance that no 
event occurs in time or space interval t. Then q(t k ) x + t t ), 

and so on, whence q(t) = {<?(i)}' = er *, say. Chance of one event 

between t and t -f- dt = — ^ dt = he~ M dt. Write P(«,T) for chance of 

« occurrences in T. 

P(« + i,T) = chance of n in / multiplied by chance of 1 in T — t 

« f T P (»,/) . he-w-m 

J 0 

A P(i,T) = ( r p (o,f)herW-»dt = f* er»herw-»dt « hTer** 

J 0 Jo 

/. P(2,T) = r hur*h<r*'*-* dt = \h % Thr* 

j 0 

T 

Continuing we have P(*t,T) *= ^{hT) n er^ — -~j~, where u = hT * 
average of series. 

A hT is average number in T, h is number per unit interval, i/h is 
mean interval and is the only datum required. 



CHAPTER III. 


THE LAW OF GREAT NUMBERS (THE GENERALISED 
' LAW OF ERROR). 

So far we have treated the normal curve of error as the 
limit of the binomial (q + p) n , and shown applications of its 
integral to cases where p had a definite meaning. The same 
- 

equation y = — ,= e 2or ’, however, is found as the result of 

cr V 27 T 

much wider hypotheses, and it is the main purpose of this 
chapter to develop them. 

•Before proceeding to the general law there are some 
important propositions to consider as to the relation between 
the standard deviation of a sum or average of magnitudes 
selected from a large group or groups, and the standard 
deviations of the magnitudes themselves. These propositions 
(pp. 287-9) depend only on the fundamental laws of 
probability, and are independent of any process of limits or 
of neglect of small quantities. 

Standard Deviation and Mean Cube of Error of a Sum and 

Average. 

Let u v u t ... u t .. . u mi be m x measurements which form a 
frequency group, and let ti be their average and <r w their standard 
deviation. 

Let Ut = u -f- ut. 

Then m x u = S u t and Su t ' = o, 

and m^u 2 = S u\* = S (u% — u ) 2 = S u t % — 2 u . S Ut + m x u 2 

= Su t * — 2a . m x u + m x u % 

S ut 2 = (o-u 1 + «*) (32) 

287 


and 
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Let K v v v t . . . v t . - . v m% be m % measurements in a second 
frequency curve, whose average and standard deviation are v 
and <t v . , t 

Now select at random one object from each group, say u, and vt . 
Required .the average and standard deviation of the group formed 
by all possible values of u t + Vt, every double selection being quite 
independent of every other. Let H s be the average and s t the 
standard deviation of this group. 

We will suppose that an indefinitely great number of inde- 
pendent selections is made, so that in the new group the m x x m t 
possible values of u$ + Vt occur with equal frequency. 

Then H i xm 1 xw a 

= (tti+flj) +...+(«!+ v t ) + • • • + («i+ Vm t ) m % terms in 
+ (w2+ v i) + • • • + (w a +ty) + • . . + (w a +i/ mi ) each 
of m x lines 


+ (ttwii + ^i) + • • • + (#7*! + ^) + . . • 

= m % . Su t +m l . Sv, — m a w 1 «+w 1 w a 5 
H 2 =/ 7 +v (33) 

and w 1 w 2 (s a a +H a 2 )=S(w*+^) a 

= («l+V 1 ) 2 + • • • +(«i + Vm 1 ) a +(« 2 +V 1 ) 2 + . . .+(Wj+V w J 2 + . . . 

= m^>ut 2 +m 1 Sv t 2 +2 . Sut . Sv t 

= tn 2 m x (cr u 2 -\-u 2 )-{- w 1 w a (or t , 2 +v a )+2w 1 « . m 2 v 
= W^Ucru^+cr^ + ^+iJ) 2 } 

S 2 2 -cr u 2 +^ (34) 

If the group was formed by the difference u 9 — v t instead of the 
sum, we should obtain in a similar way H a = a— v, but s 2 2 = 
as before. 

Next let the sum (or difference) be formed from three groups, 
the averages and standard deviations being a, v, w and <r u , <r v , <r w 
for the groups, and H a , s 3 for the sum or difference. 

Then 

H, = a ± v ± w, (35) 

as can be readily shown. 

We can obtain s 3 by supposing u t and v t first combined, and 
then a w added, and using the formula already proved twice over. 

s * = s t + = o’u 2 + or , 2 + <r w * . . . . (36) 

and the formula can be extended by induction to any number of 
groups. 
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A very important case is when the standard deviations of 
the original groups are equal, so that <r u = ...== cr, say. 

If the sum is formed from n such groups, and its standard 
deviation is $, we have 

. s 2 = s n 2 = <x 2 + <j 2 + to n terms = n<r 2 

and s = cr . Vn (37) 

Next, instead of taking the sum of the n measurements, let 
us take their average. Every term in the composite group is 
then tp be divided by n, and therefore the standard deviation 
of the group of averages, cr a say, will be the standard devia- 
tion of the group of sums divided by n. 


S cr 

n ~~ Vn 


(38) 


Finally, if the average is taken of n items, all selected inde- 
pendently from the same (indefinitely large) initial group, so 
that the chance of selecting any one of the n items is not 


affeqfed by previous selections, we have still cr a = 


Vn 


In the following paragraphs it is assumed that the original 
measurements are all from the averages of their groups, and 
that therefore o = £=£=... , and o = Su = Sv = . . . 


The mean cube for the sum of u, and v t is 


1 

m x m % 


S(w, + v t )* 


= — — {m«Sw 8 + tn x Sv 2 +3 • SvSu 2 + 3 . SwSz; 2 } =— Su *-\ — — St » 3 

Wj tn^ 

= «/*#+ vfh> the sum of the third moments about the 


average of the groups. 

Hence M s , the third moment of the sum of « items, all 
from one group, — n^, where ^ is the third moment of the 
group, and for the sum 


M* _ _ K> 

s® « ? o-* ■s/n 


(39) 


where k is for the group the value of “ k ” as defined in 
formula (8). 

* is the same for the sum and for the average of n items. 


u' 
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Genesis of the Curve of Error. 

We now proceed to the analysis which leads to the applica- 
tion of the curve of error. A quite simple case, which links up 
the two parts of this chapter, is as follows. 

If the original groups are represented by normal curves of 
error, it can be sho^vn that their sum and average are also 
normal. 

For, if we write x t = u t + v it the chance of the concurrence 
of values of the parts u u v t is , 


i 



e 



i — 

x — 7= e W 

<t v V2tt 




27 rcr u (T v 


The whole chance of x t ( + Sx) is obtainable by integrating 
this expression for all values of u, and equals 


2 7T<T U 


r» ( ***** V x t* 

■ — e~ ' 2 °uW *«** + *.• 7 e~ 2(<r »* + a * %) du . Sx 

U&V J -00 

= (writing u' iox u - - f^ ), 

j x * _ °u* + 

. e ^ . Sx . e ' 2<r u t<r v* 

GyPv J -m 


: (using formula p. 268) 


$ 2 V27T 


du’ 


1 J » . where s* 2 = <xj + o-,*. 


The chance of the value x is, therefore, — ^ 


S % V 2 tt 


. (40) 


The process is easily generalised by induction, and the 
chances of obtaining x from the sum and average of n indepen- 
dent selections from a normal curve whose standard deviation 
is o are respectively 


r V27m 




»** 

V n ' 27 * 


and — " e 
<rV 2 ir 


(41) 


The same result is obtainable as a first approximation when 
the original curves are not normal, but satisfy certain condi- 
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tions which are obtained in the analysis. The result is so 
important that two proofs are given in the following paragraphs. 


Proof by the Multinomial Theorem . 


In this proof it is shown that the moments obtained by 
an extension of the method of the preceding paragraphs 
(formulae (33) to (39)) are for all orders the same as those of a 
normal curve of error with appropriate standard deviation. 

Let thqre be n elemental groups containing m lt m t . . . m* 
measurable things respectively ; in any, the / th , group, let 
the average, the standard deviation and the moments about 
the average be u t , <r t , > and let the items be 

&t + tU\t Ut 4 " t u %, • • . > fit 4 ~ t u 9 • • • • 

Then *«!+<«, + ... + ... = o. 

One item is selected at random from each group, and n 
such items are added ; it is assumed that the selections from 
different groups are independent of each other, and that the 
chance of obtaining a particular magnitude from one group is 
not affected by previous selections. 

In the 5 th selection the sum is H + E„ where 

H — u x + u % + . . . + u n , and E # = x u 9 + t u g + . . . + n u 9 . 

Let s, M„ M s ... be the standard deviation and moments of 
the frequency curve of E,, that is, of the frequency curve of 
the sum. 


M a = s* =* mean of all possible values of ( x u 9 + 4 - • • • 4 - »«#)*. 

There are m 1 xm 1 x ... x m,* = N, say, such values. Then 
generalising the process of p. 288, 

= s*= 4 -f- S,M* + — S,M* + . . .} + §-{-^-S lW . S t u + ...1 
1 N 1 m t 1 J N \tn x m % 11 J 

= <7| 2 -f - 4" • • • 4~° r n*» since o = SjW = S =* • • • 


Similarly 


M s = S ( X U 0 + t U 9 + ... + n Ui ) 9 


- J { - s,«» + — S t «* + . . .) + 3 | JLsjwSjtt* + . . .} 

N tw! 1 f»i t 1 J N {m 1 m i 1 * J 

' iA*» + */“* + • • • + since o = S,#, etc (42) 
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+ I ir{^ s -“’ s -“ s '“+'- 

+ ^ I — S 1 «S j mS,«S 4 m + ...]■ 

~ N \w 1 w s w,w 4 1,1 J 

— l /“4 + */*4 + • • • + nM 4 + 6 (^l* 0 ’** + < r l*‘> r 8 * + •••+) • (43) 


M 4 — 3S*=» StM«+& (o- 1 *<J - i*+ . • •)— 3 ( <r i*+ or s , + • • •) i== S(t/* 4 — 30**) . 
If the standard deviations and moments of the elemental 
curves are equal, so that <r ± — cr a = ...=» <r, ^ = t fj^ = . . . =/ij 
etc., we have 

s for the sum = trVn, <r* for the average = ^ , . . (44) 


. M » 
k, for sum or average, = — 


»/*s 


* « s <r» Vn ’ 


where k' is the “ * " for the elemental curves. 


_ o = M * _ o _ Hit* ~ 3 °~ 4 ) 

^ S 4 ^ tt 2 cr 4 ftVr 4 v 

— sum “ K * ~~ 3 " s ^ or elemental curves 

n 


(45) 


(46) 


Hence * tends to zero as Vn becomes large, and «, may be taken 
as 3 if - is negligible. 

ti 

To find higher moments, we need to evaluate M, for any 
integer t ; that is the mean of (i« + *«+••. + »«)‘, which 
(by the multinomial theorem *) is the mean of 

— j — s . . .«"• .... 

% !«, 1 . . . 1 2 


• The multinomial theorem is an extension of the binomial theorem ; the 
following is an outline of the proof. 

The product of t factors 

+ • * •) ...)••• + . . .) 

*■ the sum of all possible terms such as a % b % c k d % b % . . . k t , each suffix 
occurring once. 

The number of such terms in which an a occurs *4 times, a b n t times . . . 
is the number of permutations of t things taken altogether in which n x are 

alike, w, alike ... i.$. — r*^ — . Nowwritea 1 «fl l »...« 1 tt l 6 1 at l »i.,.- l s l 

ftj ! j ! . . • s 

etc., and we obtain the result in the text. 
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when all possible terms subject to the condition «i 4 
are summed. 

First take the case where t is even. 

M„ = sum of means of the terms -p— - — .«"■ . „u n * . . . 

. I «, I . . . 

when «, + » 2 + • • • = 2/ 

= sum of terms — — • mean 1 «’* 1 x mean x . . ., 

ttj ! 1 . . . 

since .from the independence of selection from the different 
elemental Curves each occurs with each t u . . . with equal 
frequency. 

/. M* = sum of terms 

(2 1 1 ) 

l/x "‘ ' tfln> ' ' ' 

Now restrict the analysis to the case, where the standard 
deviations and moments of the elemental curves are equal, so 
that i/^n, — t/^n, = . . . = etc. 

m 

Let there be / factors i /x na . . in any selected term. Then 
such a term occurs n C/ times in the various guises 

l/*nj X g/*n 2 X f /*n t • • • , X g/x^ X g Mn g • • • , 1 /*», X g/i-n, X 

each of which is identical with 


f^ni X X /X n# .... 

Hence M* = Sum of terms 
_ 2 1\ 

nC/ n^TnT ITT. x /*"■ x • • 

where all values are taken subject to the condition 

ttj + tig -f- , * , = 2 1 , 

Now s* = wo* 1 , s xt — n e . <r a ‘, and 

»C/=«(« -!)...(«-/ 4- x)//l 

^ = sum of terms 


(2<) I 


/I 


nf fin, 

’ n* * <r*i " <r"* ’ * ’ 
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It ^s now necessary to restrict the elemental curves so as to 
satisfy the condition that & is finite for all values of p, 

i.e. that Mean is finite, or that the effective range of the 

curve is comparable with its standard deviation. We have then 
to consider which of the possible terms is finite and which of 

order ^ or higher. 

Since /u x is o, each of n 1 , « 2 , etc. is 2 or more in every term 
that is not identically zero ; hence since the sum of ; the /‘terms 
M lf . . . is 2 t, the greatest possible number of such terms 
is t and / > t . 

y\S j 

If f <t, the fraction — is of order - or higher. 

n l n 

if/=<, then 2 = Mj = n t . . . , and we obtain, as the only term 
when ~ is neglected, 

(2/)! I tlS /> 2 \/ _ 2 t\ 

~z r *7"! ■ n‘ ' Uv “ 2 HV 

since n 2 = <r a and 

is between i and i — — — . 

2 n 

Hence M* = ** • §77 = 1 ■ 3 • 5 • . . (2* - 1) s« . . . (47) 
when terms involving ^ are neglected. 


By a similar argument 


1 

s 2<4 1 


= Sum of terms 


(2/ f 1) 1 
”1 ' *2 ! • • • 


f\ + \<T n t' 


Here there is no term not involving a power of n in the 
denominator, and the greatest term is found when one of the 
quantities n x , n 2 . . . =3, and each of the others = 2 ; so that 

2/ + 1 = «, + «2 + • • = 2(/ - 1) + 3 = 2/ + 1, and/= t, 

and we have / equal terms obtained by putting n v n 2 . . . 
successively = 3 



THE LAW OF GREAT NUMBERS 


295 * 


Then 


m*- 1 . H 
2*-*3 ! 'tWn' <r 2 ‘ +1 


M 2(+ l_ /w ( 2t + I) I I 

$ 2 < + l / X ^<-1 

w 


H 




• • • (48) 


.\ M 2<+1 = o if terms involving are neglected, 


and M w+] = ~.i.3.5...2/ + i.M 1 . s 2t ~* (49) 


since 
1 


M, 


= -“ 3 — -, if terms in — are retained, and terms in 
S 3 Vn.c T 3 Vn 




neglected. 


These moments are (see formula (23) and Appendix, Note 6) 
precisely those which are obtained from the curve 


y = 


S \^2l T 


. e 


X* 

•2i« 


if is neglected, and from the curve 
Vn 


1 r /c /* £ * 8 y 

y S V 27 T L 2 Vs 3 ' S 3 /_ 


X* 

2<* 


if is retained and - neglected where k = 
Vn n s 


Hence, if we may take identity of standard deviations and 
of all moments as implying identity of curves, these equations 
are the first and second approximations to the curve of 
frequency required. 


Professor Edgeworths Proof . 

The proof given by Professor Edgeworth (" Law of Error,” 
Camb. Phil. Trans., Vol. XX., Part I., 1904) is briefer and more 
'general, but it involves rather more difficult mathematical 
conceptions, which it was the intention of the analysis just 
given (which is essentially based on Edgeworth’s work) to 
avoid. 

Edgeworth gives a formula for any number of successive 
approximations, but the outline which follows is confined to 
the first two. 

With the same notation and conditions as before, 
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Let Ej = iU» 4" %Us 4 . . . nUt. 

Let* a be any fixed small quantity, only used to select terms of 
the same dimensions, 

Then e a * = c a * * w ' . e a * ’ u ' . s a • * M ' . . . identically. 

The mean value of e a • * u *, that is the mean of 

(I f a. 1 « + ^.i«*+ Jy- !“* + •••) 


= 1 + 0.^!+ — • lMs + rj • l/*» + . . . , 

where t fi t = o. v 

Since the selections from the different elemental curves are 
independent, the mean of the product of e a> X e a ' x . . . = the 
product of their means. 

a* a 8 

•\ i + a . Mj H — M g H — . Mj + . . . = Product of n factors 

2 3 I 


such as 


a 8 a 8 

C 1 + — • OH + J-j • t/H + . • •) 


1°6 ( x 4* + — Mg + . . .) 


s <-ii 0g(I + 2' wl#+ Ti“ /1 * + 


= -St^ + g- . S <A £, + — S </<4 + • • * -f S • 1^*2 + . . .) 
I + + ^M, + . 

01® 

g- • 8*^ J gj (S1M4 ~ 83 

==S ^ • € m 6 

“ ( x + J s,/i * + • • • + (t s ‘^) +•••)• ( x + ^ Sj/X * + •••) • 

( I + ^{ S ‘'* 4_3S( ‘ /1 * ) *} + •••)••• 
Equate coefficients up to a*. 

M x = o 

s* = M a = St/*, == So-(* = nu 1 , if <r* is the mean of «•,*, <r $ * . . . 
M, = S tH = n/uj, if fi, is mean of 1 n 9 , 


§-I.(J. s 4+i(s w -3SW>), 
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• • • M 4 - 3s 4 = Sfa - 3<r, 4 ) , and K, - 3 = ^ - 3 = ~s(^ - 3)^ 
, * =^(*2' ~*3). if *«' — 3 is mean — 3)^*. 


Hence 


; = M 8= ^ = / whereK , = ^ 

s 3 tra* Vn cr* 


1 + i s ' + i 

- + b‘- s ‘ ■ -k - + ■ • •)(■ + -s ( -'-3>+- • •) 

On the right-hand side of the equation in every case the 
index of a equals the suffix of or the sums of the suffixes 
of powers or products of /x’s. 

Now assume that throughout the elemental curves ^ is 

finite for all values of p , and it results that the coefficient of 

a p . s p contains the factor as has been worked out above 

Up to a 4 . 

Neglect ~ and all higher powers, 
vfi 

= - a 3 S 3 + • • • + a 2< . ^ . 

.\ every odd moment, = o 

$ 2 < 

and an even moment, M Jt> = (2 f ) ! =* 1 .3 . . . (2*— 1) .s 3t (50) 

/ 1 2* 

as in the normal curve of error (formula (23)). 

1 1 

Now retain and neglect - . 

M a < is as before. 

— - _ 1 * ot * , 

(2f+i)I (f— i)! 2 < " 1 6 # Vn 

that is, the (2/+ i) th moment of the curve 


sV 2v 

(see Appendix. Note 61. 


{'-iC-sS))*’® • • • • 
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Heqce, by the test of equality of moments, the curve of 
frequency of the sum or average of n selections under the given 
conditions has for its first approximation the normal curve. 


when is neglected, and for its second approximation the 
skew curve already given. 

Further approximations, which so far have been found 
mainly of theoretic interest only, are given by Edgeworth. 


Statement of the Generalised Law of Error , or the Law of Great 

Numbers . 

The theorems now proved can be summarised as follows, 
the conditions of validity being restated and amplified. 

Let there be a large number (n) of elemental groups, each 
of which can be represented by a frequency locus, such that 
the chance of obtaining a magnitude U by selection from a 
group is a function of U. 

Form a total, H, of n things, one selected from each group, 
so that the selection from one group has no (or very slight) 
effect on the selection from another ; and obtain many values 
of H by repeating the process, in such a way that the selec- 
tions which make one value of H are not affected by the 
selections which make other values.* 

Then if the frequency loci of the elemental groups satisfy 
certain conditions, the frequency locus of H has a definite form 

X* 

to which v =* — 7= e 2 '* is a first, and 
S V 7r2 



Is a second approximation, where s* is the second moment and 
xs 8 the third moment of the locus. 

The frequency locus of the average of the n magnitudes is 
of the same form as that of the sum, and k has the same value in 
both cases. If s, is the standard deviation of the average, 

• If the selections from one elemental group are not independent, but the 
magnitudes tend to come in batches, then more values of H are necessary 
to obtain any given approximation to its final frequency form when an 
indefinitely large number of values are taken. 
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s, = where s = aVn aiid s, = -4= if <r is typical 'of the 
n V« Jr 

standard deviation of the elemental curves, k is of the order 

in comparison with 1 in the frequency equation of H, and 

bnly the first approximation is necessary when » is very 
great or when the elemental curves are symmetrical, in which 
case k = o. 

The condition that must be satisfied by the elemental 
curves is that, if ^ is the p th moment and a the standard 

deviation of any one of them, is a small finite number *^that 


can be neglected when multiplied by 


-FT) 

n ' 


for all values of p ; 


this is secured when the great bulk of the frequency curve is on 
a base containing only a small multiple (i, 2 or 3) of its standard 
deviation to left and right of its average. This condition is 
quite generally satisfied by ordinary frequency groups when 
n is at all large. 

The first and the second approximations are only valid for 
x 

moderate values of - , since beyond these the contributions of 


further approximations become sensible ; it is only the 
central portion of the frequency curve of H so generated that 
is determinable ; the outer portions have no general form, and 
it can only be postulated that their aggregate volume is small, 
and that the chance of exceeding, say 3s, is negligible. The 
range that is to be understood by 11 the central portion " 
depends on the value of n ; as the number of independent 
elements increases, so the range of the determinable form 
extends. In ordinary cases with n as great as 100 it may 
perhaps be said that the frequency curve is known over a range 
of 2 s on either side of the origin. 

It follows that the applicability of the law of error to given 
observations should not be denied on the ground that the 
positions of extreme values do not conform to the law. 


• More exactly it is only the difference between this ratio and the corre- 
sponding ratio in a normal curve that is involved. 
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* Case when the Universe is Limited . 

On pp. 287 seq. it was assumed that the selection cf one 
item did not affect the chance of further selections. 

As on p. 282, we will now examine the case where the 
universe from which the selection is made is limited, so far as 
the determination of the average is concerned. 

Let a group of n things be selected at random from a group of 
N things, whose measurements are a + u v a + u t . . a + « N , where 

u is the average and «< = o. Write H + E for the sum of the 

measurements of the n selected things, where H = na. 

There are „C n equally probable values of E, such as 

Wj + W a + ^3 + • • • + Wn 

% + + *<4 4* • • • + 


The sum of the values is easily seen to be zero, and therefore 
the mean value of E = o. 

Let s be standard deviation of E. 

Then *C % . s* = sum of N C n squares, such as (u t + u 2 + . . . + w„)* 
each containing n terms. 0 

In the sum each square, such as u t * t occurs ~ x K C n times, and 
each product 2 u 9 Ut occurs ~ x — — X M C n times, since in all 

nLj 2 

there are » X N C » squares and — — X N Cn products. 

2 

M C» • s * ^ jq X uCn . S x We* “f* 2 . ^ • m C n • Sw*W* 

S* * 5 . Ncr* + US*) 1 - Swe*}, 

where er is standard deviation of the universe from which selection 
was made, 

4 n (n — 1) 4 c 

= no-* Slnce ^ Ut ^ 0 

= <rS - n irri =tr,M -( I _ 5 ) (52) 

if ^ is negligible. 

/H E\ 

Let <r« be the standard deviation of the average ( — -f -) of the 
n selections. 
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Then *°-£-:^V( 1 -S} '• ( 53 ) 

• % 

•If N is indefinitely great, we obtain as before, formula (38) . 

v« 

ft 

• By neglecting ^ we exaggerate the standard deviation. 

It can be shown (Isserlis, Stat. Journal , 1918, pp. 75 seq.) 
that the frequency of the sum or average is very approximately 
normal, when N is large as is generally the case in practice. 

If* we %use the table on p. 271 with the value -^=, we 

vn 

exaggerate slightly throughout the chance that a deviation 
exceeds any given amount. 

Note . — That the law of great numbers is obtainable from the 
limit of the terms of (p + q) n , as shown above, can be proved as a 
special case of the general analysis. 

Let each elemental group contain qm zeros and pm units, where 
p + q = 1. 

The constants of such a group are 

- - Q * 1 x Q + P m x 1 __ . 

qm + pm ~~ 


qru 


pm 


O A I 


pm, qm are at distances + q and — p from the average, A. 


ip + q)m ?q ‘ Vpq ’ 

«r* (j>q)l y/pq 

Form a total by adding selections one from each of n such 
curves ; this satisfies the conditions for the formation of H above. 
The total has a frequency curve with average pn, standard 


deviation <rVn == Vpqn, and * = 
p. 264. 


^rrTWpqn' “ ***** f ° Und ' 
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Illustrative Examples . 

In its integral form the law of great numbers, so far°as ( the 
second approximation, is 

p 1 1 ■ ix * Wr, (' - (‘ - $)‘ •'*)' 




0 

= F (z) + k /(*), 


*'dz + {1 - (1 - z*) «-»**} 

6V27T 


where P ( x ) is the chance of a positive deviation from the average 
not exceeding x> z = - , F (*) is tabulated on p. 271, and 

<j 

f(x) = — - I / — {1 — (1 — * a ) <?-***} is tabulated on the next page. 

6v 2ir 


k = , and /i f and <r are the third moment and standard devia- 

tion respectively of the curve, calculated either a priori or from 
the observations. «> 

Eight examples follow to illustrate the method of fitting 
the curve to observations. In the first two (words and bricks) 
the genesis of the measurements leads one to expect agreement 
with the law of great numbers ; in the next two (skulls and 
plaice) application to biometrical measurements is shown ; in the 
next (ages) there is an indirect relation to mental phenomena ; 
in the last three (speeds, food consumption, and prices) the 
nature of the variation is complex and sporadic, and the form 
of the frequency curve could not be forecast. 

Only the first example is worked in full. 


+ See Appendix, Note 6, for the integration. 
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Table of Values of '/(*) 


1 

6 \/ 2 * 


1 1- (1 


4 


X 

• 

•/<*) 

X 

A* 

X 

/<*) 

X 

/(*) 

M 

At) 

•OO 

•0000 

•50 

•0225 

IOO 

•0665 

150 

•0935 

2*00 

0935 

•OI 

•0000 

•51 

•0233 

1*01 

•0673 

151 

•0937 

2*02 

.0931 

<02 

•0000 

*52 

•O24I 

102 

•0681 

152 

•0939 

204 

•0927 

•03 

•0001 

•53 

•0249 

IO3 

•0689 

i -53 

•0942 

2* 06 

•0923 

.04 

•0002 

•54 

•0258 

1*04 

•0697 

I% 54 

•0944 

2- 08 

•0919 

05 

•0003 

•55 

•0266 

1*05 

•0704 

1*55 

•0945 

2*10 

•0915 

•06 

•OOO4 

•56 

•0275 

106 

•0712 

1*56 

•0947 

2*12 

•0911 

*°7 

•0005 

*57 

•0283 

1-07 

•0719 

x *57 

•0949 

214 

•0906 

•08 

•0006 

.58 

•0292 

1*08 

•0727 

158 

•0951 

2- 16 

•0902 

•09 

•0008 

f 

•59 

•O3OI 

109 

•0734 

1-59 

•0952 

2*18 

•0897 

•10 

•OOIO'* 

•60 

•0310 

MO 

•0741 

i-6o 

*0953 

2-20 

•0892 

•11 

•0012 

•61 

•O318 

I'll 

•0748 

i*6i 

•0955 

2*22 

•0887 

•12 

•OOI4 

•62 

•O327 

I* T2 

•0755 

162 

•0956 

2*24 

•0882 

•13 

•OOI7 

•63 

•0336 

II 3 

•0762 

1-63 

•0957 

2-26 

•0877 

•14 

•OOI9 

•64 

•0345 

II 4 

•0769 

1-64 

•0958 

2-28 

•0872 

15 

•0022 

•65 

•0354 

II 5 

■0776 

165 

0959 

2*30 

•0867 

•16 

•0025 

•66 

•0363 

Il6 

•0782 

1-66 

•0959 

2*32 

*0862 

•17 

•0028 

. 6-7 

•0372 

l ' l 7 

•0789 

1-67 

•0960 

2-34 

•0857 

•18 

•OO32 

•68 

•0381 

Il8 

•0795 

1-68 

•0960 

2*36 

•0853 

•19 

OO35 

•69 

•O39O 

M 9 

•0801 

169 

•0961 

238 

•0848 

•20 

•OO39 

.70 

•0399 

1-20 

•0807 

1-70 

•0961 

2*40 

•0843 

•21 

•OO43 

•71 

•0409 

1*21 

•0813 

i* 7 i 

•0961 

2-42 

•0838 

•22 

•OO47 

•72 

•0418 

1-22 

•0819 

1-72 

•0961 

2-44 

•0833 

• 2 ,i 

•OO52 

•73 

•0427 

1-23 

•0825 

i -73 

•0962 

2*46 

•0828 

•24 

•OO56 

*74 

•O436 

I-2 4 

•0831 

1-74 

•0962 

2-48 

•0823 

•25 

•0061 

*75 

•0445 

1-25 

•0836 

i -75 

•0962 

2- 50 

•0818 

•26 

•0066 

•76 

•0455 

1*26 

•0842 

1-76 

•0961 

252 

•0814 

•*7 

•OO7I 

*77 

•0464 

1-27 

•0847 

1-77 

•0961 

254 

•0809 

•28 

•OO76 

.78 

•0473 

1-28 

•0852 

1.78 

•0961 

2-56 

•0804 

•29 

•0081 

•79 

•0482 

1*29 

•0857 

1-79 

•0960 

2-58 

•0800 

•30 

•0086 

•80 

•O49I 

130 

•0862 

i*8o 

•0960 

260 

•0795 

•31 

•0092 

•81 

•0500 

1*31 

•0867 

i-8i 

*0959 

2* 62 

•0791 

•32 

•OO98 

•82 

•0509 

1 - 3 * 

•0871 

1*82 

•0958 

2*64 

•0787 

•33 

•OIO4 

•83 

0518 

**33 

•0876 

183 

•0958 

266 

•0782 

•34 

•OIIO 

•84 

•O527 

i -34 

•0880 

*■*4 

*0957 

268 

•0778 

•35 

•0116 

•85 

•0336 

i *35 

•0885 

1*85 

•0956 

270 

•0774 

•36 

•0122 

•86 

•0545 

136 

•0889 

1-86 

•0955 

2*72 

•0770 

*37 

•0129 

•87 

•0554 

1*37 

•0893 


0954 

2-74 

•0766 

•38 

•OI36 

•88 

•0563 

1-38 

•0897 

1-88 

•0953 

2*76 

•0762 

*39 

•OI42 

•89 

•0572 

1*39 

•0901 

1-89 

•0952 

2-78 

•0759 

•40 

♦OI49 

•90 

•0581 

140 

•0904 

1 90 

•0950 

2* 80 

*0755 

• 4 i 

•OI56 

•91 

•0389 

141 

•0908 

191 

•0949 

2-82 

•0752 

•42 

•OI64 

.92 

•0598 

142 

•0912 

192 

•0948 

*f 4 

•0748 

•43 

•OI7I 

•93 

•O607 

1-43 

•0915 

1-93 

•0946 

2-86 

•0745 

*44 

•OI78 

•94 

•0616 

x *44 

•0918 

1-94 

0945 

2*88 

•0742 

*45 

•Ol86 

*95 

•O624 

1-45 

•0922 

1-95 

•0943 

2*90 

•0738 

.46 

•OI93 

.96 

•O632 

146 

•0924 

1*96 

•0942 

2-92 

•0735 

•47 

•0201 

*97 

•064O 

I - 4 7 

•0927 

1-97 

•0940 

2-94 

•0732 

.48 

•0209 | 

•98 

•0649 

i- 4 $ 

•0930 

1*98 

•0938 

2-96 

•0730 

•49 

•0217 

•99 

•0657 

149 

•0932 j 

1-99 

*0937 

2-98 

•0727 


• A*) 

3-oo 0724 

3.20 *0702 

3-40 -0687 

3-60 *0677 


* A •) 

3*80 *0671 

4-00 0668 

4*20 0666 

00 *0665 
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I. A lengthy book was selected, and a number of letters in 
each of the first completed words in 10,000 consecutive lines 
were noted (A); also the total numbe, of letters in the 
1000 batches obtained by adding the first 10 entries, the 
second 10, etc. (B) ; and 100 totals were similarly obtained by 
adding batches of 100 (C). 

The curve of frequency of A is purely observational, and 
its form cannot be foretold ; that of B tends to satisfy the 
conditions under which the law of great numbers appears, but 
“ n ” is only 10, and unless A is nearly normal the form can 
only be foretold in the central region ; in C, with ‘ n ” 100, 
the second approximation should fit over a considerable 
region, and the first approximation will be sufficient if A is 
fairly symmetrical. 


A . — Distribution of 10,000 Words According to thb Numbers of 
Letters in them. 


Number of letters. 

* 

Observ- 

ations. 

y 

xy 

1 or 

•3 to 

x *5 

-7 

127 

- 889 

2 

„ 

x *5 

Ft 

a -5 

-6 

1,792 

~ 10752 

3 


3 *3 

99 

3*3 

-3 

1,984 

- 9920 

4 


3 3 

99 

4*3 

-4 

1,240 

- 4960 

5 


4*5 


3*3 

-3 

968 

- 2904 

6 


3*3 

99 

6-3 

— 2 

812 

- 1624 

7 

„ 

63 


7*3 

— X 

893 

- 893 

8 

„ 

7*5 


8*3 

0 

634 

0 

9 


8*3 

99 

9*3 

X 

602 

+ 602 

10 


93 


10-5 

a 

460 

+ 920 

IX 

n 

10-5 

„ 

ii *3 

3 

260 

4- 780 

12 

„ 

xx ‘5 

99 

12-3 

4 

xi6 

+ 464 

x 3 

,, 

12-3 


x 3’3 

3 

69 

+ 345 

*4 

» 

x 3’3 


x 4‘3 

6 

IX 

4- 126 

15 

11 

x 4'5 


x 5*5 

7 

18 

4- 126 

x6 


x 3'3 


X 6-3 

8 

4 

+ 33 







10,000 

-31942 

+ 3393 

-38547 


x*y 

x*y 

1 

F(i)* 

Diff. 

X 10,000 

6,223 

- 4356 i 

-162 

0-447 

490 

64,312 

-387072 

-x -27 

•398 

770 

49,600 

-248000 

- -92 

•321 

1,020 

19,840 

- 79360 

- -58 

•219 

X,28o 

8,712 

- 26136 

- -23 

• 09 X 1 

1,390 

3,248 

- 6496 

4- *X2 

•048/ 

1,330 

893 

- 893 

4 - *47 

•181 

1,130 

0 

0 

4- -82 

*294 

850 

602 

4- 602 

4- x * x 7 

*379 

570 

1,840 

4- 3680 

4 * x *53 

*436 

330 

2,340 

4- 7020 

4 -x *87 

:469 

180 

x ,856 

4 - 7424 

4-2-22 

*487 

80 

1,723 

4- 8625 

4 - 3-37 

•ill 

30 

756 

4 - 4536 

4-3-93 

xo 

882 

4 - 6x74 

4 - 3*27 

*499 

xo 

356 

4- 2048 

4-3-62 

•300 

0 

x 63,183 

-791518 
4 - 40109 





-731409 





M — —1*8547. Average is 8 — M ■■ 51453. 

Ml ™ 16-3285 - i* ■ 8-1792, <r — a-86o. 

Ml - -75*1409 - 3 (-**8547) (16-3185) + t -1*854 7)* - 181704. 

« - **-. - -78. 

Ml* 


For calculating the moments an arbitrary origin has been 
taken at 8. 

In fitting the normal curve z = * ~ The first entry 

F(x) = *447 shows the proportion included between the average 
and -5 letters (*=—7.5). The normal curve gives 530 instances 


* Table, p. 271. 
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below -5 letters, and in other respects the last column te not a 
close approximation to y, the observations. The original curve 
is not hormal, but it* is unimodal and continuous, and in spite 
of * its skewness the great bulk is contained in the limits 
x rfc 2a. Hence we have all the conditions for obtaining the 
1S.W of great numbers if we add elements taken at random from 
the curve. 


B . — Distribution of 1000 Sums of the Letters in 10 Words. 


Number of 



Differences 

Observa- 


F ( m ) 

Differences 

letters. 

% 

»» 

* w 

X 1000. 

tions. 

y(*)t 

+«/(*)• 

X xooo. 




4 

O 



O 

26-5 

— 2*650 

•496 



CO 

9 

•528 





13 

8 



8 

31-5 

-2-119 

+83 



•091 

*520 





39 

38 



37 

36*5 

-1*588 

*444 



•095 

+83 





89 

97 



99 

41-5 

-i*°57 

*355 



•071 

•384 





154 

155 



173 

46*5 

— *526 

•2014 



•025 

•2II'| 




f 

203 

227 


f 

213 

5i-5 

-f *005 

•002 j 



•000 

*002 J 





202 

202 



191 

5*5 

+ -536 

•204 



*026 

•193 





153 

134 



135 

615 

4- 1 '067 

•357 



•072 

.328 





88 

76 



78 

66-5 

+ 1-598 

•445 



•095 

•406 





38 

37 



40 

71-5 

+ 2*129 

+83 



■091 

•446 





13 

13 



18 

7*5 

4-2-660 

•496 



00 

0 

•464 





3 

9 



7 

81.5 

+ 3191 

*499 



•069 

•471 





1 

3 



2 

86-5 

+ 3*722 

•500 



•067 

•473 


* 



0 

1 



0 

For 

the 1000 

sums 

the average is 

51-453, a = 

9 ‘ 4 I 55 , 


k ==* *4093. The sums are all between 26 and 87. The calcula- 


tion of the columns z, F (z) and Differences are on the same 
method as for A. The normal curve now fits much better, 
and in the range 31-5 to 76*5, that is, average ± 2 <r, there is 
nothing to be desired ; but the formula gives too many below 
31*5 and too few above 76*5. 

The second approximation gives a very close fit throughout, 
except that it fails to stretch so as to include the one entry 
above 86-5 (see p. 432 for test of fit). 

• 4 - for negative values of x . t Table, p. 303 

X* . 
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Wo should have expected to find that the standard devia- 
tion and k of these observations were the standard deviation 

(2*860) and * (78) multiplied respectively by Vio and 

(formulae (37) and (39)). 

But 2*860 x Vio = 9*04 and *8i 4- Vio = *25, whereas we 
get from the B observations 9*42 and -41. This points to a 
failure of complete independence in the aggregation of the 
10 words ; and analysis shows that the author’s style 
changes from the earlier to the later part of the book, so that 
there is some correlation between 10 words taken consecutively. 
In fact, when we sum 100 words consecutively as in C, we get 
a 33-311 instead of 2-86 x V 100, while when the order of 
summation was re-arranged so as to include entries from all 
parts of the book in each 100, a was 28-87, which accords 
with theory. 

C. — Distribution of ioo Totals of the Letters in ioo Words. 


Number of 



Differences 

Observa- 

letters. 


* w 

X zoo. 

tions. 

415 

— 3001 

•499 






•7 

I 

435 

— 2*400 

•492 






2*8 

% 

455 

— i*8oo 

*464 






7.9 

7 

475 

— 1*200 

•385 






i6*o 

19 

495 

— 599 

•225 






22*5 

25 

515 

— 001 

•000 






22*6 

18 

535 

4 *6oi 

•226 






15*9 

18 

555 

4-1*202 

•385 






7.9 

6 

575 

+ 1-803 

*464 






2*8 

3 

595 

+*•403 

*492 






*7 

0 

615 

+3003 

•499 






•I 

I 

$35 

+3-604 

•500 




The agreement between formula and observations in this 
table is very close (see p. 432), and cannot be improved 
perceptibly by using the second approximation. 

This experiment, which was devised with the definite 
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intention of illustrating the law of great numbers (and the 
correlation surface, formula ( 102 )), has thus proved to be 
completely satisfactory, even in that it also illustrates the 
difficulty of securing random selection. 

2 . In a garden the paths are bordered by bricks originally 
laid (but not mortared) lengthways touching each other. After 
they had been exposed for some time to the influences of 
weather and of gardening operations, the lengths occupied by 
143 sequences of 4 bricks were measured as nearly as possible 
to the nearest sixteenth of an inch. The causes of variation 
were-^-inequalities of the bricks as they came from the mould, 
inequalities in the slight interval between one and the next, 
displacement since they were laid, and difficulties of measure- 
ment. These causes are multiple and independent and each 
of small effect. It might be expected that their effects can 
be expressed as the sum of errors, and that the distribution of 
the measurements would be approximately normal and 
symmetrical. 


Distribution or Lengths of Four Bricks. 


Length. 

Number 

observatu 

33 

I 

35 A 

X 

354 

3 

35A 

7 

35i 

XI 

35A 

4 

35l 

21 

35A 

7 

354 

30 

35A 

16 

35 1 

13 

3544 

6 

35f 

zi 

3514 

7 

354 

4 

3544 

I 

36 

0 

143 


Calculated by formula 
(normal curved 

*7 

14 

2*7 

5*1 

8*o 

£1-6 

150 

177 

18*3 

175 

14*9 

n*4 

7'9 

50 

a«7 

*•4 

•6 

14*9 


X* 2 
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Except for an obvious tendency to give the measurements 
to the nearest Jth of an inch instead of the -^th, the fit is fairly 
satisfactory. When this tendency is corrected the fit is very 
good. 

3. I am indebted to Professor C. G. Seligman’s Some 
Aspects of the Hamitic Problem in the Anglo-Egyptian Sudan 
for the following measurements, whose frequency groups I 
analysed at his request. 


Skull and Stature Measurements of the Dinka Race. 


Grades from 
average. 

F(s). 

Over 3<r 

•4986 

fir- . 

M938 

2<r- . 

•4772 

|<r- . 

•433« 

<r- 

•3413 

9 

2 

•1915 

O- 

O 

9 

•1915 

— 9 • 

•3413 

-i* • 

•433* 

— 2<r . 

•477* 

-f<r . 

•4938 

Under — 3<r 

•4986 

Total 

• 

Average 

Standard de- 
viation . 


Cephalic Index. 

Difference Observa- 
X148. tions. 

•2 2* 

*9 

1 

*3 

2 

6-5 

4 

13*6 

14 

22*2 

18 

28*3 

30 

28.3 

30 

22-2 

*3 

13*6 

13 

63 

7 

*•3 

2 

*9 

0 

•2 

0 

148 

148 

— 

7**7 


3*7° 


Nasal 

Index. 

Difference 

X85. 

Observa- 

tions. 

•I 

O 

*5 

I 

*•3 

O 

37 

4 

OO 

6 

12-8 

13 

163 

*7 

1 6*3 

12 

I2’S 

8 

7.8 

6 

3-7 

4 

1*3 

3 

*5 

1 

•1 

0 

~85~ 

85 

— 

91*6 


13*0 


Stature. 

Difference Observa- 


XI 16. 

tions. 

•1 

O 

*7 

2 

i-8 

I 

5 *i 

I 

10-7 

6 

174 

22 

22*2 

*4 

22*2 

*5 

17*4 

*3 

10*7 

6 

3*1 

4 

x*8 

1 

•7 

1 

•X 

0 

116 

Xi6 


178*6 cm. 
9*66 


Except for the two extreme cases marked * the range is 
normal, and the deviations from the normal curve are not 
greater than is to be expected with so few examples. 

4. The lengths of 554 plaice measured in .the North Sea 
Fisheries Investigation gave the following results : — 



THE LAW OF GREAT NUMBERS 


309 


Length 



Difference 

Obiervrt 

cm. 


r V*/ 

* 554 - 

tions. 




i *3 

O 

• 35*5 

2*825 

•4976 






9-2 

6 

34*5 

2-076 

•4810 






40-6 

50 

33*5 

1*327 

•4077 






104-9 

105 

3*-5 

-578 

•2183 






158-6 

166 

3 i -5 

- *171 

•0679 






140-3 

M 5 

30*5 

— -920 

•3212 






72*7 

61 

• 29 ^ 

-1-669 

•4524 






22-0 

10 

28.5 

-2-418 

•4922 






3*9 

7 

27*5 

-3167 

•4992 






*4 

3 

26-5 

-3-916 

*5 






0 

1 

25*5 

-4-665 

*5 


— 


Average 31-778 ; <r -= 1-335. 


The agreement is not close at the extremities. 

•5. The number of school Children of various ages in the 
sixth grade are given in a report of the public schools of 
St. Louis, U.S.A. 

The following table compares the data with the first and 
second approximations to the law of great numbers : — 


Age*. 

Number of 
children. 

N umber* calculated from 
rst Approx. and Approx. 

IO- 

26 

39 

27 

II- 

201 

207 

204 

12- 

673 

630 

670 

13 - 

1,001 

983 

995 

14 - 

739 

7 8 5 

746 

15 - 

3 io 

323 

307 

l6- 

80 

67 

79 


13 

1 

9 

0 

15 

0 


Average age, 13-665 ; c = i-ioo; r ** *2059. 


The first approximation fits well within 2 a of the average. 

The second approximation is remarkably close to the 
observations (see diagram in Appendix, Note 6). 

6. The speeds of 100 pedestrians were calculated from 
observing the t/me they took between two marks (Die Schwan - 
kungen der landwirtschaftlichen Reinertrdge — Mitscherlich). 
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Average velocity, 1-5846 metres per second, r — *2179 m. 


Speed. 

Average +*50111 or more. 




Numbers various speeds. 
Calculated. Actual. 

I-I 2 

+ •40 

rt- 

O 

<J« 

O 




*•3 

2 

+ *30 

.. -40 




5*o 

4 

+ *20 

» *30 




9-6 

II 

+ •10 

V -20 




14-3 

10 

0 

M *10 




177 

18 

— •10 

„ O 




177 

20 

— •20 

M 1© 




M3 

15 

—30 

M — *20 




9-6 

8 

—40 

„ -’30 




50 

7 

-.50 

» “’* 4 ° 




2-3 

3 

— *50 or less 




i*i 

0 


7. From material collected by the Working Class Cost of 
Living Committee, 1918, the expenditure on food in one week 
by 970 urban families was determined, and the results divided 
by the number of “ equivalent adults ” (where a child is taken 
as a fraction of an adult). The average was 10-75$ ; a = 3-156, 
k = -84. 







Number of Families. 

Weekly expenditure per 
“ unit A on food. 




Actual. 

Calculated by 

Calculated from 




and approx. 

Pearson's Type 111 . 

Not exceeding 5-5* 




l8 

22 

7 

5*5* • 




IO7 

123 

122 

7*5 •• • 




255 

233 

252 

9-5 . • • 




245 

248 

250 

n*5 




*73 

168 

172 

*3*5 




IOI 

89 

95 

155 




38 

51 

43 

175 





22\ 


19*5 




9 ! 

33 ”h35 

7 f*7 

Over 2i*3 • • 




7J 

1 ) 

ij 


a, - -708. a. - 4 035- 


8. The price of flour was determined in U.S.A. in 272 places. 
In five towns the price was given as 4 cents per lb., and 
these were evidently exceptional and are excluded. For the 
remaining 267 the average was 2-629 cents per lb., and <r =--3334 


• s«« p. 345. 
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Towns Classified According to the Price of Flour. # 


Average. 


Calculated 
itt Approx. 

Actual. 

4 - 3 <r or more ^ . 

*4 

2 

+ !<r to 

3 * 

1*3 

3 

+ 2 <r .. 

f<r 

4*4 

i 

+ »» 

2<r 

. . 11*0 

9 

+ * 9 9 

fcr . . 

245 

16 

+ » 

a 

. . 40-0 

43 

o ,, 


. . 5i*i 

67 

<r 




2 ” 

o 

5i*i 

47 


c 



— <r ,, — 

2 

. . 40*0 

37 

~ 

<r 

• • 2 4'5 

*5 

— 2 <T „ — 

|<r 

. . n *8 

9 

-*<r „ - 

2<r 

44 

4 

Less than — 

4<r 

1-7 

4 


In the range average ± 2<r the agreement is fairly satis- 
factory and satisfies the test explained below (Chapter X). 



CHAPTER IV. 


APPLICATIONS OF THE LAW OF ERROR. 

Precision of Sums and Averages. 

It follows from the previous chapter that if n measurable 
things are selected at random from a universe where the sizes 
are distributed in a frequency group which is fairly con- 
tinuous, and little of it far from its average as compared with 
its standard deviation (<r), then the average belongs to a 
frequency curve whose standard deviation is a/Vn and its 
form approximately normal. 

<r has generally to be determined from the observations 
themselves, and may differ from that of the universe, but only 

by- a quantity of order X (see p. 417 below). 

The first illustration that follows (persons per tenement) 
gives 12. cases where the averages of samples are compared 
with the averages of the universes of which they are samples. 

The two following illustrations (digits and latitudes) show 
how the distribution of a number of averages agrees with the 
normal curve of error. 

In cases in which the theory applies, not only the standard 
deviation of the average can be given, but also the chances 
that the error of the average will exceed any given multiple 
of that standard deviation. 

Since the universe is unknown it cannot always be stated 
whether its frequency group satisfies Edgeworth's conditions 
(p. 299) or not. We can sometimes test this from the samples 
themselves. Suppose that we take k samples each of n' items, 
and form their averages x v x 2 ... x k into a frequency group. 
Then if the conditions in the universe are satisfied, this 
frequency group should be approximately normal, but not 
completely normal if n' is not large. If this is the case a major 
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sample may now be formed consisting of n = w' x h items. 
Its average equals the average of x v x 2 . . . x kt and 
as it *is formed by selection of k things from a group, 
which, being approximately normal, satisfies the conditions 
in question, we may expect that the error in the 
major average has normal frequency with standard 


deviation —7=, where a is calculated from the n observations. 
Vn 

The k quantities x v x 2 . . . x k should have for their standard 


deviartion^y^p approximately. 

Thus in the example on p. 315 below the distribution of 
the 2000 items which are aggregated in 80 groups is not known. 
Here k = 80, «' =25. The averages of the 80 groups are 
found to have standard deviation 1-628. It may be deduced 
that the standard deviation in the universe is approximately 
1-628 x V25 =8-14. Then the standard deviation for the 


average based on the whole 2000 is 


1-628 x V25 _ 1-628 

V 2000 ~ V8° 


as 


stated below. 

As an alternative we can examine the frequency group 
formed by the n selections and see if it satisfies Edgeworth's 
conditions. If it does, we may take it that the error of the 
average has normal frequency. 


Precision of Averages. 

A sample was taken (as described on p. 281) of the house- 
holders' Census schedules in a number of districts, and the 
average number of persons per tenement was calculated in 
12 districts. 




In sample of x 

in 50. 

In whole district. 

1 

Registration district. 


Tene- 

ments. 

Persons. 

Persons 

per 

tenement. 

Tenements. 

Persons 

per 

tenement. 

Standard 

deviation. 

Bethnal Green, N.E. 


277 

I.224 

4*42 

13.850 

435 

'M 

S.\V. 


278 

I,26l 

4-54 

13.905 

4-60 

•M 

Shoreditch, S. 


187 

792 

424 

9.331 

4-26 

•18 

N.W. . 


152 

693 

4’56 

7.623 

4*34 

•19 

N.E. - 


156 

653 

4-19 

7.847 

439 

'19 

Spitalfields . 


130 

637 

490 

6,476 

4-79 

'21 

Whitechapel 


II? 

519 

4-44 

5.9M 

4-72 

•22 

St. George . 


187 

924 

4*93 

9.374 

488 

•18 

Shadwell 


95 

387 

4 *o 7 

4,800 

4*37 

•25 

Limehouse 


133 

6 ll 

459 

6.655 

4*54 

•21 

Mile End, S.W. a 


267 

1,21 1 
839 

454 

13.366 

471 

•15 

N.E. . 


207 

405 

10,364 

440 

•17 
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Frpm a study of the Census volumes relating to the whole 
districts the standard deviations (<r) of the number of persons 
per tenement is found to range from 2*38 to 2-75. ‘ 

The standard deviation for the first entry above is then 

— ^==: * *14 if we take the lower and more stringent value pf 
V2 77 

<r. The other standard deviations are calculated similarly. 

The differences between the sample averages and the whole 
in 6 cases are less than the calculated standard deviation, in 
4 cases exceed it by less than a quarter of itself, in 1 case by 
30 per cent., and in 1 case the difference is twice the standard 
deviation. 

Normal Distribution of Averages. 

Ten digits were selected from successive final digits in 
seven-figure mathematical tables and summed, and the process 
repeated till 1000 totals were obtained.* 

The average and the standard deviation of the group so 
obtained, were 45*014 and 9*205 respectively, as compared 
with 45 and ^82-5 == 9*083, which would be obtained from 
an indefinitely large random selection if the digits o to 9 
were equally distributed. 

The following table compares the distribution of the 1000 
with the normal curve of error. 

Number of Totals of 10 Falling Within Certain Limits. 


Distance from 
average. 




Calculated. 

Standard 

deviation. 

Observations. 

Differences. 

Above { <r 




6 

24 

8 

+ 2 

2 a 




17 

4*1 

17 

O 





44 

6-5 

47 

+ 3 

•<r 




92 

9*1 

75 

-17 





150 

11 3 

157 

4 * 7 

0 to \9 




191 

12-4 

197 

4- 6 

0 to —ior 




191 

12-4 

201 

4" 10 

-J <r 




150 

ii *3 

148 

— 2 

— € 




92 

91 

77 


“I* 




44 

65 

50 

4- 6 

— 2 IT 




17 

4 ' 1 

20 

4 - 3 

Below — f 9 




6 

24 

3 

- 3 


The standard deviations are calculated from the formula 
yfp (1 — pjn (formula (13)), where n = 1000 and p is the 


* Such selections are found not to satisfy completely t the conditions of 
independence. See Statistical Journal, 1912-13, p. 702 Nixon 
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proportion that falls within a grade in the normal laM&; thus 
between a and *092 of all are expected and p = *092. 

In ’arranging the ’observations <r is taken as 9*205. 

* The differences between theory and observation are less 
than the standard deviation in 9 cases, and exceed it but are 
less than double in 3 cases. 

The normal curve is therefore an adequate representation 
of the group. 

The average as found from the whole sample of 10,000 is 
determined as 45*014 with standard deviation - 2 - '- 20 JL = *20, 

' Viooo 7 

and is unexpectedly near the average of the sum of 10 digits in 
general. * 


From a geographical index containing 31,210 names, 25 
were selected roughly at random, and their latitudes entered 
to a degree (ignoring minutes), the distinction between north 
and south being ignored. The 25 latitudes were then averaged 
and the process repeated till 80 averages were obtained. 

# The general average was 35*0° and the standard deviation 
of the group of 80 averages was 1*628°. 

The following table compares the distribution with the 
normal curve of error on the same plan as in the last example. 


Distance from 
average. 

Above I<r 



Calculated. 

*5 

2<r • 



i-3 

|<r 



3*5 

<r • 



7'4 




120 

0 to J<r . 



15-3 

O „ — \<r 



I5'3 

- 



120 

— <r 



7*4 

-\a 



3*5 

— 2<r 



i*3 

Below — f«r 



*5 


Standard 

deviation. 

Observations. 

Differences from 
nearest integer. 

— 

I 

0 or i 

?I*I 

2 

1 

i*8 

I 

2 or 3 

2-6 

IO 

3 

3. 2 

12 

0 

35 

12 

3 

33 

16 

1 

3*3 

15 

3 

2-6 

7 

0 

i*8 

2 

1 or 2 

?II 

1 

0 

— 

I 

0 or i 


In eight cases the difference is below the standard 
deviation, and in the remaining two slightly above it. 

The average 35*0°, as found from the sample of 

80 x 25 = 2000 latitudes, 

1*628 

has standard deviation degrees = *i8 degree, and is 

therefore not' known accurately to the first decimal place. 
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Absolute Errors in Weighted Sums and Averages. 

It has been shown above (p. 288) that if H + E is the sum 
of n quantities selected independently from n frequency 
groups, whose averages are 2 u . . . n u , and standard 
deviations a v <r a . . . <r n , and H = t u + 2 u + . . . + n u f then 
H + E t , the sum in any selection, = x u t + z u t + . . . 4* n u t , has 
for its standard deviation s, where $ a == a* + o- a a + . . . + <r n 2 
The same analysis can readily be re-arranged to show that, 
if we take a weighted sum 

H + E ( = Wj . jM ( -f~ W, . t u t . . . + W„ . n u t , where W,, W 2 . . . are 
constants, the standard deviation becomes 


S* = WjVj* + WW + . . . + W„V„* = S (W ,W) . . ( 55 ) 


and the standard deviation, s a , of the weighted average 


is given by 


. . S(W,W) 
0 “ (SW t )» 


H+E 

SW t 


( 56 ) 


If n is large, so that is negligible, and the other condi- 
tions stated on p. 299 are satisfied, the frequencies of the 
sum and average are normal, and the table on p. 271 can be 
used to ascertain the chances of deviations from the mean 
value H. 


Let (t 2 be the weighted mean value of or 1 2 , u- a a . . . cr n *, so that 
^S(W t a )==S(W t a cr< a ) 

Then s* = cr 2 S (W* 1 ), 


and 


,,_5.S( W) 

a iC\\r \2 


(SW,) 2 


( 57 ) 


Now let SWf = ww, and W t = w + w t , and nvj = S{w t % ), so 
that w and are the average and standard deviation of the W's 
regarded as a frequency group. Swt = o. 

Then 

$(yft*)~S(w % +2wwt+Wt 2 )== i nw 2 +2wSwt+Swt t =n(w*-{-<r„ t ) . (58) 
and s a =«a a (w a + or* 1 ) (59) 

and ••• s “=* V( i+ ^) • * (6o) 
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The last formula gives, in a convenient form the standard 
deviation of a weighted average, when the weights are known 
and not subject to error. The deviation of the original iteir s 
is reduced in the ratio i : y/n, and becomes small when n is 

great, while the factor i + ^ ^ is rarely as great as y/. 2 , 

since ~ measures the ratio of the standard deviation of 
w 

the weights to their mean value, and this ratio in ordinary 
cases is less than unity. 

If’ the. average is unweighted, we have, of course. 



(61) 


The fundamental formula s* = S (W e 2 <r< 2 ) was used by the 
Committee of the British Association on Small Incomes. 
(See Statistical Journal, 1910, December, p. 62, where different 
letters are employed.) 

There were 31 classes in each of which the number not paying 
income tax was estimated as, say, N f , with standard deviation s t ; 
their average income was If with standard deviation s\. The 
aggregate income of the class is then N ( I ( with standard 
deviation <r t , where 

crt 2 = Mean{(N t + e t ) (If +*',) - N t I<} 2 
= Mean{Nf*'t + I^} 2 = N t V, 2 + h*s t \ 

when products of *'s are neglected. 

The standard deviation for the sum of N ( I ( is therefore s, 
where 

s* = S(NfVf 2 +If 2 st 1 ). 

s 1# s t . . . and s\, s' a . . . were estimated separately for each class. 

If we suppose that the numbers in the classes were known 
exactly and only the average incomes in the classes subject to 
error, then we should use the formula above s* = S(W *W) = 
in this case S(Nf 2 s , t 2 ), which we of course also obtain by writing 
s ( sa o. The standard deviation of error in the average income 

of all the classes taken together is then 5 * — . 

In the investigations (If 2 Sf 2 ) =315 x io®, S(NfVf f ) = 4 x io 8 , 
so that the errors in N were not important. S(Nf)=4023, 
S(Nflf) ■« 284,700, and the average income in 1910 of persons 
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other than wage-earners not paying income tax may be written 
as £71 with standard deviation £5. 

* < 

Relative Errors. 

So far we have dealt with absolute errors and deviations, 
the actual differences between particular values and observa- 
tions from their means or true values. It is now proposed to 
discuss relative errors and deviations (as used in Part I., 
Chapter VIII, supra). 

If x is the observed value of a quantity whose true, value 
or mean (as the case may be) is x', and x — x' (1 + e), then 

% — — X* , • 

e is the relative error or deviation.* 

x 

1. Products and Quotients. 

If two factors F lf F a are independent, and erroneously 
measured as F 1 (i+^ 1 ), F a (i +c i ), and e is the resulting relative 
error in their product P, we have 

P(i + e) = F x (1 + e x ) . F 2 (i + e 2 ), where P = FjF a .... (62) 
e — c 1 + e t + e x e t = e x + e 2 , if products of es are negligible. 

Hence if a, <r v a t are the standard deviations of P, F v F t , 
we have by the formula (34), p. 288, <r* = af + <r t *. 

The result can be extended to any finite number of factors, 
so that 

= <Ti* + (T t % + O-J* + . . . . . . . (63) 

The error of x", n finite, if given by x n (i + e) ={x(i + ^)} n , 
where e x is the error of x. 

, n(n — 1) t _ 

ne l ~] ef + . . . =n^ . . . .(64) 

when squares are neglected, and the standard deviation of 
x " is n<r where a is that of x. 

The result is true when n is fractional. E.g. the error in a 
cube root is one-third the error of the quantity. Thus if a 
number 1006 is taken as 1000 (relative error *006), the relative 
error in the cube root is *002, the root being given as 10, instead 
of 10*02 =» 10 (1 -f *002) approx. 


jg 

• In Chapter VIII above it was more convenient to take as the 

x 

error. II we call this *|, the relation between the two is e 7 ^ — # -f «• — e * . . . 
and — -1, when, as may generally be presumed, #* is negligible. 
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If e is the error in Q = F r /F t> and F x and F t are independent of 
each other, 


Q (*+«) = 


F»(i + <i) 
F,(i + e t ) • 


t = (i+<J 1 )(i+^ a ) 1 — i =^ 1 — 5 a -f- squares and products. (65) 
or g 2 = o'! 1 + <r 2 2 , where <r g is the standard deviation of e. (66) 


If e is the error in a power, a*, where a is known, and ^ is the 
error in x 

CP (! + <■)= «* <1+ ' ,) 

e = a** 1 — 1 = t x . x log a when is neglected . . (67) 

Generally if e is the error in a function, f(x) 

f(x) X (x + e) —f{x(i + Cj)} =/(*) + e x xf\x) + . . , 

and e = (68) 

2 . In Averages. 

Let m be the unweighted average of n quantities M t , M a . . . 
Mf . . . M n , and let M t — m + tn t , so that S m t = o. Let na m % — S mf. 

Suppose the quantities erroneously observed as M t (i + e t ) for 
Mi, etc., and let e be the relative error in their average. 


m{i +e) — - S(M t (i +«<)) = m + -S(M,«,). 

n n 


Then 



(69) 


If s a , <r t are the standard deviations of e, e t then by formula 

(55) p- 316. „ 

•'-sfe-O-s®. 


if o J is the weighted mean of erf . . . af . . or if all these 
standard deviations are equal 


since 

and 


So* = <r* . 

S mt — 0, 


S(m + m t )* 
n 2 m % 


m 


2 + Vm 2 

nm % 1 



. ( 70 ) 


This formula is of the same form as that relating to absolute 
errors in a weighted average. (Formula (60).) 
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In 'this formula <r m , m and Vn are known, and therefore 

the ratio of ~ can be stated exactly. has to be estimated 

from whatever circumstances are known about the individual 
measurements. 

The conditions of p. 299 are generally satisfied when an 
average is computed, if the conditions of random sampling are 
preserved, and therefore the normal table of frequency is 
applicable if n is large ; it is approximately applicable if n is 
no greater than 20. 


3. In Weighted Averages. 

[Based on article in Stat. Journal , 1911-12, pp. 81-88.] 

Let m* = » where M* (and th, <r m ) have the same mean- 

ing as before, and VJ v W, . . . We . . . W n are weights. 

Let W* =* w + w t , where nw = SW* and S wt = 0, and let 
ncr w * = S Wt 1 . 

Then S (W*M t ) == nwihu,. 

Suppose that the weights are imperfectly known, so tt\at 
W t (1 + rj t ) is taken instead of W* etc. 

Let the errors in the M's be as before, and let e be the resulting 
error in m w . 

Then in (1 I A S{W t (i +r?e)M e (1 + e t )} m 
Then m„(i +e) S)W 7 (i+^)} 

• . „ S{W t (x + vi) M ( (i 4 - e,)\ . SW, - S(W,M.) . S{W,(i + Vl )\ 

S (W,M,) . SjW,(i + Vl )j ' 

_ S ( . S W, + S (WtMtv,) SW, - S ( W, V( ) . S ( W,M,) 
S(W,M,).SW, ' 

neglecting er, and >?* 

_S(W|M,e t ) S{(W(M,.«iF~ Wt.nwm w )r)t} 

” S(W,M«) + S (WM t ).nw 

S (W,M,e,) S{ W, (w, + m — ih w ) q,} 

" = nwih„ nwihu 


(7i) 


Now ihu = + gjfc ) } nwm + mSw t + wSmt+ Sw t m, 


nw 


nw 


’ m 


KMI'f)} 


and (wttnA 

nw 


( 7 2 ) 
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S( W,M*,) S(Wjm t rj t ) 

nwm w ' nwnty, 


approximately, if the difference 


m—nfo is neglected, while in full the numerator of the second 
term is S {W* (M* — m w ) rj t }. 

. Let <r, c r*, <r't be the standard deviations of e, ct, rj t . 

Then ^ = + . (73) 


Let (Tj — <x 2 — ... — 1 
be weighted averages 
<r' 2 S (\V^)* = S{W* (M* • 


r, and 
so that 

- m u ) <r'(} a . 


= — ... — <t’ , Or let cr*, cr** 
cr*S (W,M,)« = S (W,M,<r,)* and 


Then • S (W<M t )* + <r'*S{W t (M, - «.)}*} . (74) 

cr and </ must be estimated from whatever errors seem probable 
or possible in the circumstances of the measurement. 

The other quantities involved can be calculated from the obser- 
vations. A good approximation in ordinary cases to this result is 


s, -£(>+3f)(*+3i)+£('+g)g- • <«> 


*This approximation is obtained as follows: — 

S ( W|M f ) * = S {( w* 2 ww t + w*) («* + 2 mm t -f m t *) [ 
a=s nw x m x -f- wwVJ -f -f- n<r w f <r m * 

-f S w t x ( m t % — cr m l ) -f ^mwSw t m t -f 2 wSw t m t * -J- 2mSm t w t * 

• S ( W r M t)* f , , ^rn* \ , 2 o^5mVi| 2tf t , 2 , n T^ 

n(w »//)* \ w/ a /\ m x ) w x m % ivm w/n x w 2 ui 

where 

Swnt ~ Swm* _ _ S w 9 m 0 Sw 2 m* _ Sw 1 (m* — 

n<r*<r m n<r w <x m * n<rjv m * na„ % <r m % M*« s 'f» a 

S{W, (M, -«„)}*=» S{ W, (m, + m - «„)}» 

= SWM* + 2 (m - «„) S W,*m, + (« - /«„)» SW,* 

__ 2 

=* «/* . »<r m * 4- 2wSw t m t t -j- — — Sw t m t (2wSw t m, -f 


+ (^r) , ” ( “'* +7 “’ ) 


S{W, (M, — mj 1 <r m * , 2<r..<r„.» 
n(55/«)* m 1 «//«* 11 




where 
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and 


S‘{(>+s£)+(8£-3"i,'> 


‘+2^r lf - 2 ‘ 



Now r, r lv r tl , R M each contain in their formulae factors (m t , W t or 
m t * — <r m *) whose sura is zero, and therefore, unless large values of the other 
factor (m,*, w t * ) are found specially with positive or specially with negative 
values, the sum of the products is small, and terms containing these tend t?o 

be small in comparison with the other terms. Also ^ - i-f r ~^!r. 

m wm 

If we neglect r, r,„ r n , R„ we obtain the approximation given above. 


Examples . 

Some examples, worked in detail, will show the relative 
magnitude of the quantities involved. 

i. The first is a calculation of wages, where the weights 
are taken with great roughness, the number of persons of both 
sexes and all ages being taken for weights to compute the 
average wage of men only. It is only in very imperfect 
investigations that so deliberate an error would be introduced. 

The contribution due to the errors in observations of 
quantities, typified by a, has in the approximate formula two 
factors, each of which is always greater than i and generally 
less than 2 ; these factors can be computed from the observa- 
tions. 

On the other hand the contribution due to errors in weights, 

typified by a 9 in (75), contains the factor both in the 

approximate and in the complete formula, i.e. the .square of 
the ratio of the standard deviation of the quantities to their 
mean value. In the cases, which are quite common when 
weighted averages are in question, where this ratio is small, 
the effect of errors in weights is smaller, and sometimes very 
much smaller, than the effect of equal errors in quantities. 
Hence the statements (pp. 94 and 185) that under ordinary 
conditions more attention should be paid to accuracy in 
quantities than to accuracy in weights. 

Finally, as regards weighted averages, the table of proba- 
bility on p. 271 may be applied to measure the chance of devia- 
tions greater than Sr, 27 , 3? ... if n is great, and it gives 
approximate values when n is as small even ai 20. 
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Unu Trades, Excluding Engineering and Shipbuilding, 190& 
Cd. 5814, p. xi for numbers, and p. xiii for wages. 






Number of 

Average 

Trad*. 




persona 

earnings 





employed. 

of men. 





W 

M 





000’s. 

r. d. 

Pig iron . • 

• 



14 

34 4 

Iron and steel 

• 



54 

39 1 

Tinplate . 




II 

42 0 

Railway carriages . 




46 

3° 9 

iron castings . 




12 

3i 4 

Electric apparatus 




15 

34 7 

Wire 




8 

35 7 

Brass, etc. 




8 

3i 9 

Gold, silver, etc. . 




8 

36 6 

Jewellery 

Edge tools • 




3 

3 

38 0 
31 2 

Smelting • • 




8 

31 5 

Cycles . • 




7 

34 4 

Tubes . • 




7 

28 3 

Nails, etc. 




5 

31 0 

Bedsteads 




2 

36 3 

Farriery . 




2 

27 9 

Scientific instruments 




2 

36 10 

Needles, etc. . 




2 

3i 9 

Chains, etc. . 




1 

35 4 

Locks, etc. 




1 

28 0 

Watches and clocks 




1 

3« 7 

Typefounding 




1 

33 3 

Miscellaneous 




45 

3* 5 

Total . 


# 

. 

266 



n, the number of trades, = 24. S . W = 266. w = - S . W = 11^. 

ti 

in, the arithmetical average of the 24 entries of earnings, 
= 33*- = 33*5115* 

cr m = 3*47. c T* = 1474. m and w are the deviations of indi- 
vidual entries from in and w. 


Swm _ ^ __ S wm % __ 

r ~ na w <r m 1 $°’ f 1 1 ~ n(r w <r m *“ 


•095, 


r n 


S w 2 m 

n<r w 2 cr m 


•280, 


R 


12 


S mhv* 

no- m a o- w * 


1 = *264. 




= I 77 # 


Q* m 

in 


•104, 


-=?= i-33* 


w*, the average of the earnings with the numbers given in the 
table as weights', — 34s. 2 \d., = m(i +^=^); (jjfy = *959- 

y* 2 
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Theft working with the notation of p. 321, 

^*=277 x 1-011+4 x 1-33 x -104 x -150+2 x 1-33 x -on x -095 
+2 X 177 X -103 X -280+177 X -on x'264 
«=2-8 o+-o 83 +-oo 3 +-io 2 +-oo 5 = 2-99. 

/,*xfr^-^ =2-87. The approximate formula gives 2-80. 

/,*=• on {2-77+(3-i3— 3-99) X -0225 

+2 x 1-33 x -095—2 x 2-35 x -150 x -280+1-77 x -264} 

=•011 {2-77 — 020+-253 — i98+-467}=-on (2 77 +-50) =-036. 
_ * 

( tn c 

) ==-035. The approximate formula gives *031. 

v* = ( 2 - 87 a*+- 035 <r '*) . 

The averages of the men's earnings in the separate trades 
are perhaps subject to an error of 6 d. in 33s., in which case 
or = - fa , ( T 2 =» -00023. 

The errors in the weights may be considerable, for the 
weights were deliberately taken as the whole number of persons 
instead of the number of men. 

.The error so introduced is computed from p. 10 of the 
report at about -23, so that or' 2 = -053. 

With these figures a 2 =-000027 4 - *000077 =-000104. 

Hence the average may be written 

rii w (i ± a) or 34s. 2 \d. ± 4 d. 

Though in this extreme case the error in the individual 
weights is taken 15 times as great as the error in the quantities, 
the resulting error is only -0088 as compared with -0052 due 
to quantities. 

2. Perhaps the most important use of weighted- averages 
is as index-numbers of prices. 

It was shown above (p. 204) that the change of the base 
year was equivalent to a change of weights ; such a change 
will by the theory used in this chapter produce an unimportant 
effect on the result, if the necessary conditions are found to hold. 

Sauerbeck's numbers of the prices of commodities were 
tabulated for 1900 and 1911 and re-written with 1900 as base. 
Thus in the first entry the price of English wheat was 49 in 
1900, 58 in 1911, when the average of the years 1867-77 
taken as 100. This was written 100 in 1900 and 118 in 1911. 
The 45 numbers so obtained give an arithmetical average 
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107*82, while the averages. of the numbers as given by*Sauer- 
beck in 1900 and 1911 were 75-07 and 79-69, whose ratio is 
100 : fo6-i6. • 

* In taking the simple average, when all the numbers are 100, 
we in effect give equal weights to the ratios ; whereas in 
Sauerbeck’s setting, if p v p % . . . are the separate index- 
numbers in 1900 and p t ', p t ' . . . in 1911, the general index- 

numbers are I x = — — ' - ' , I a = * and 

I = 100 | a gives the movement from 1900 to 1911, i.e. 106-16, 

P' 


T S P' 


s p. 


100 - 


S P 


— ; that is the ratios of the separate 


changes are weighted with the separate index-numbers in 1900. 

We will examine the accuracy of the average on Sauer- 
beck's system, that is taking p t now written w , as a weight, 

and now written m, as a quantity. 

# The quantities involved are the following : — 

® = 75*07, <r» = 20-67, 9 ' = ‘ 2 75. W = 107-82, «r* = 20-03, 

S*==-i86, r = — -2944, thu = 106-2, r ls =-5o6, ^ = --347, 

R»»=*936- 

If we neglect r, r lt , r M1 , R 22 , as in formula (75), 

o* <r' a 

= —(1-076) (1-035) + — (1*076) X (*l86) 1 == <7* X *025 +<r' a x *00083. 
45 45 

If we include these quantities 5 * = <r* x *024 + cr' 1 X *0012. 


The difference between the two is almost solely due to r lv 
i.e, to mean wm % ; abnormal increases from 1900 to 1911, 
measured by w, are on the whole found with abnormal move- 
ments from the base 1867-77 measured by w ; but even this 
influence has not much effect. 

The error, <7, in m is almost solely due to using round 
numbers, and tends to be about yfar, and hence 
a* 1 x -024 = (-0005)*, 

and is negligible. 

The error in w could be computed if we had a definite 
system of assigning importance to the commodities. In 
default of this/ suppose they ought to have had equal weights. 



326 


ELEMENTS O * STATISTICS 


as in % the alternative computation above. Then -* *275 

measures the dispersion of the actual weights from the supposed 
true weights, and o-' 2 x -0012 — (-275 x •034) 1 — (•0093)*/ 

Hence a* = (*ooo5) 2 + (*0093)* ■■ (-0093)* approx., and the 
index-number may be written 

106*2 (i rfc *0093) =* 106*2 db I, 

and this shows the kind of margin we should have in mind 
when using index-numbers. 

Actually the difference between the numbers calculated on 
the two hypotheses is 107*8 — 106*2 «■ i*6. - 

Comparison of Averages. 

If the errors in the two investigations are quite inde- 
pendent and lead to averages in the for m A x (1 db <r 1 ) 
A* (1 dt or,), the standard deviation of Aj/A, is + <r 2 2 , by 
formula (66). 

But it often happens that errors in the same- sense (both 
positive or both negative) are made in corresponding items at 
both dates ; thus the wages of a class may be underestimated 
at both dates. In such cases the error is reduced by the com- 
parison. 

Thus to take the case of a simple quotient Q = F 2 4 - F t . 

If e 1 and e % are the relative errors in F lf F t and their standard 
deviations are <r 1# <r t , then the error in Q has standard deviation 
y/(<ri + cr, 1 ) by formula (66). 

But, if d = e t , mean d* = mean e x 2 -f mean e 2 * — 2 mean e k e a . 
The last term only vanishes if all values of e % are equally likely to 
occur with any value of e lf and not if e 1 and e % are likely to be of 
the same sign. 

E.g. if e t = \e x always, o-, 1 = mean e x e % = $ mean e l t =\<r l 2 ' 
and mean d* = <r x % + £ a x % — <t x % and the standard deviation of the 
ratio is Jo*. 

The necessary analysis for the ratios of weighted and unweighted 
averages is given in the Appendix, Notes 7 and 8. 

The approximate formulae are as follows, the notation being 
as on the previous page. 

If s r is the standard deviation of the ratio of two unweighted 
averages, 
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where ad is the standard deviation of the difference betweefl tt and 
e% the errors in measurements of the corresponding quantities 
M<, M%' at the two dates. 

• While if s r is the standard deviation of the ratio of two 
yeighted averages, then approximately under certain conditions 

: K 1 + + (iTs) ("* + "'*)} • • (77) 

where <rd is as before, a and a are standard deviations of the errors 
in quantities and weights, M*' = (1 + a + u t ) M*, where S u t = o, so 
that 1 + a measures the mean rate of growth of the quantities, 
and tr u is* the standard deviation of u and measures the scattering 
of the rates of growth. 

If then the errors in the quantities tend to be the same at 
both periods, the first term in the bracket { } in (77) is small, 

and if the quantities grow at nearly the same rate the second 
term is small. In any case the standard deviation diminishes 
1 

Vn 

Under conditions which are often fulfilled it follows that 
very great accuracy can be obtained in the ratio of weighted 
averages, though the original errors in the measurement of 
quantities and in the systems of weighting are considerable. 
It is important not to vary the methods of computation, so 
as to obtain similar errors and a small value of a d . 


with 


Example . 

Data for Estimating the Change in Average Weekly Wages in Certain 
Industries in the United Kingdom. 


1880. 1900. 






Numbers 

w. 

W JT 

Numbers 

W'. 

Wages 

M'. 

Ratio of in* 
crease of M 
x + *+ u . 

Agriculture : 




oooo’s. 

shillings. 

oooo’s. 

shillings. 

England and Wales 



135 

15 

120 

162 

I*o8 

Scotland . 




24 

18 

20 

21-2 

Il8 

Ireland . 




98 

9 

86 

10*4 

116 

Building 




84 

27 

^23 

310 

1 15 

Printing 




8 

31 

13 

32*9 

l*o6 

Shipbuilding 




7 

28.5 

13 

34’® 

1-22 

Engineering 




72 

25 

106 

3°"5 

1-22 

Coal 




44 

23 

75 

343 

1-49 

Puddling 




9 

31 

11 

38- 1 

1*23 

Cotton 




52 

16 

54 

I9‘5 

1*22 

Wool and worsted 




12 

14 

12 

136 

•97 

Worsted 




12 

M 

12 

M '4 

1 03 

Gas 




3 

*7 

8 

3T-o 

115 

Furniture . 




12 

23 

18 

248 

I- 08 
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Th6 numbers are of all engaged in the industries frbm the 
General Report of the Census of England and Wales, Table 35. 
The rates of increase are from Mr. G. H. Wood’s paper m the 
Statistical Journal , 1909, p. 93. The average wages are 
computed from various sources ; the accuracy of the ratios 
is more important than the accuracy of M. 


n = 14, tn = 21*54, m ' = 25-20, m* = 18-69, m* = 24*09, 

/ / 

cr m cr,ft cr m/t cr tg r 

— ='3I9. =T — ‘35 1 . = = 1*00. * 90. S = -l6o, cr„ = -I2, 

tn tn w w 


r = - -42, r tI =- -44, r M =- -42, R„ = -25. • 

tn' 

Ratio of unweighted averages = = 1*170 ; of weighted 

tn 

fn' w 00 

averages — — 1*288. 

m w 

Mr. Wood gives ^£ = 1*163 and •^ = 1-219 for these, using 
different weights. 

** + *0015 (<r a + a' 2 ), 

by the approximate formula (77), and by the full formula (148), 

App., 

S r * = *I45cr d l + *022 <T 2 + -0035cr'* + *01 6c/ d *, 

where </<* measures the difference between the errors in the weights 
at the two dates. 


The approximate formula fails to do justice to the error in 
quantities owing to the great change in the weights in the 
period whose effect is ignored. 

To see the effect of these errors, suppose the error in the 
wages in 1880 (<r) is and in the weights (a) is and that 
similarity of error makes <r d = \<r and <r d = \<r'. 

Then $,.*=-000091 +-000055 +-000035 +-000040 =-00022. 

$ r = *015. 

The ratio of the averages may be written 

— — (i ± Sr) = 1*288 ± *020 

Iflxo 

i.e. the percentage increase instead of being 2 p may be any- 
where from 27 to 31. 
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Actually the elemental, errors may be larger than# those 
here supposed. These figures are given as an example of 
method and to show* the influence of the various terms ; but 
n = 14 is too small for the theory to be closely applicable, 
and a serious study of general wage-changes would need a 
wider range of industries and more exact determination of the 
numbers and average wages. 


Significance of Differences Between Averages . 

A ver^ important problem that frequently arises in practical 
statistics is to determine whether the difference found between 
two averages of similar classes or groups could be due to the 
error incident to observation (especially to the inclusion of 
too small a number in a random sample) or can safely be 
attributed to real differences of characteristics. E.g., if the 
observed death-rates of two classes are 147 and 14*3 per 1000, 
are we justified in saying that the death-rate of the first class 
is the higher, or should we expect a difference of *4 if we simply 
separated two parts of the population arbitrarily ? 

If the observed difference is greater than is to be expected 
in chance selection, it is said to be significant, i.e. significant 
of a real difference between the phenomena. 

The general method of analysis is as follows : Suppose two 
classes containing n x and n 2 things yield averages x x and x % . 
Calculate the standard deviation of the frequency curve of 
the differences between the averages of n x things and n, 
things selected indiscriminately from the whole universe from 
which the classes were segregated, and let this be a. 

Compare x x ~ x 2 with a. The chance that the ratio is 
greater than 3 is *0027, since the sum of the integrals of 

— J—r e ** dz from 3 to 00 and — 3 to — 00 , 
v 27 r 

i.e. 2(i - F(3)) = 2(-5 - -49865) = -0027 (p. 271). 

Similarly the chances that the ratio is greater than 2 or i 
are *0456 or *3174 ; and it is just as likely as not that the 
ratio is as great as -674. If, then, x x ~ x 2 is not greater than 
•674^, there is no evidence of a real difference, that is, a 
difference due to the nature of the classes and not attributable 
to chance deviation. As x x ~ x t increases beyond this, the 



330 


ELEMENTS OF STATISTICS 


improbability of the result as a chance event increases, till 
when the ratio equals 2 the odds are about 21 to 1 (*9544 to 
•0456) against. At 2<r we may say that the event is improbable 
unless the difference is real. At 3 a the odds against are about 
370 to 1, and this is generally regarded as so improbable that 
the difference x x ~x % is spoken of as significant. At 4 a the 
odds against are about 15,000 to 1. We can, of course, never 
arrive at certainty by this method ; we have rather to connect 
the word significant with the scale of probability. In the 
following paragraphs rules are given for calculating a ; in 
every case the frequency group of the errors is normal, ‘ since 
the conditions described in previous sections are satisfied, and 
in every case the connection between a and the probability 
of chance occurrence is that described in this paragraph. 


A . — Cases of the Proportion of Things with Particular 
Characteristics in a Universe. 

1. Let N be the number of things in a universe, of which pN 
have a particular characteristic where p and N are known. q=x—p. 

Let n be selected at random, and p'n be found to have the 
characteristic. 

Then <r for p' ~ P is \J pq(^ — or \/~, if ^ is negligible. 

Example . — If dice are thrown 1200 times and 6 turns up 180 
times. N=aoo,^ = g, n = 1200, p' = A. „= J (l . | . -^)=. 0 io8 

tzJL =. ‘ oi6 7 = T . fi 

<r -0108 ’ 

and there is an indication but no proof that the dice are not uniform 
in respect of their 6 faces. 

2. Let two samples (n v p x ) (n t , p t ) be selected from the universe, 
and neglect 

The standard deviations of p x — p and p t — p are 

Hence the standard deviation of p x ~ p t * [p x p) ~ — p ) 
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is the square root of the. sum of the squares of their separate 
standard deviations (formula (34)) and 

• ~v^{*(s + %)} (78) 

If p is not known, but can only be deduced from the samples, 
the best value to take seems to be that found by merging the 
samples, viz. : p (t^ + n t ) = p x t^ + p % n t . 

Example . — In 1000 houses selected in a town, in 200 (n*) the 
head of the household is an artisan, in 800 (n a ) a labourer. 
Children of school age are present in 80 of the first group (p x = *4) 
and 420 of the second (p t = *525). (The numbers are hypothetical.) 

p X 1000 = 80 + 420, .\ p = J = q 

<r for p t ~ p t = + 8^) = ’° 4 

p - — p x *525 — *4 

=A ^r i==3approx - 

The difference is significant. 

# 3. The samples (n^), («*/>*) are selected from different unknown 
universes, and «, being large. 

E.g. t suppose that out of 1000 men selected from two countries, 

300 and 250 respectively are found to have blue eyes. 

3 A X 7 

Here p x = — in the selection, with <r, = a/ - — ^ «= *014, and 
ri 10 1 v I0 ® 

the value for the whole country (if the selection had nothing to do 

with race or climate within the country) is approximately p x . 

Similarly in the other country it is approximately p t = £. 

The standard deviation for p x ~ p t is that for the difference 

between two independent groups, viz. : 

= + — )==-02 • . . *(79) 

Pi-Pt - ' 3- -25 ,_ = .. 

<r *02 

This method is generally used when the death-rates of two 
occupational classes ( e.g ., miners and bricklayers) are compared. 

If of n lt n % under observation and m t die in a year, the 

rates (r lf r t ) are — X 1000, — X 1000, and />!=—, P % = 

n i n i n t 

since in the absence of other evidence it is assumed that the risk is 
the same throughout each class. 
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The* miners are then assumed to be a random sample of a 
universe of miners, and similarly with the bricklayers. 

Then <r for r * — r % , * 


iooo 


JhSi | 




1000 


4 


Wj ( n, — m t ) w t (n, — m 2 ) 
«!* ^ «,* 


}... 


A simpler procedure, however, is to compare each class 
with the adult male population as a whole. Then to find if 
the miners* death-rate differs from that of occupations in 
general we should use Case i. 

In the preceding it has been assumed that the chance p 
was the same throughout the universe. It may happen, 
however, that the universe consists of different regions or 
strata in which the chances are different, and the question 
arises whether we should proceed at random in the selection 
of a sample out of the universe as a whole or whether we 
should partially arrange the choice so as to take the same pro- 
portion out of each region or stratum. Mr. Yule (Theory of 
Statistics , p. 281) gives a formula which may be established 
as follows.* 


Let a universe contain n x , n % . . . n t things in t strata, and let the 
numbers which have a certain characteristic be p x n x , p % n^ . . . p t nt 
in these strata. 

N = n x -f n % + . . .*+ tit, and let p x n x + p t n^ + . . . = PN. 

Let kn 1 , kn t . . . kn t be examined in the t strata, i.e. £N = n 
in all. 

Write p x = P + d v p 2 = P + d t . . ., 

where P = p x + p x so that S (nd) = o. 


The standard deviation of p x in the sample is 

/Ml 

V knj 

and similarly for p % etc. 

Hence if or is the standard deviation for P in the sample, 
by formula (55). 


\N/ kn, ^ VN ) kn, ^ 


— KMi + + • • •) 
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iNV ■=* (i — Pi) + . . . * 

— S (np) - S(np*) = NP - S{«(P + W) 

. * = NP — &P* — 2P . Snd — Snd* — NPQ — N<r p * 

vghere <r p * = — (t^d* + n,d t * + . . .) 

... <r . = E2_V 

n n 


(80) 


Here <r is the standard deviation for the observed result, P is 
the actual proportion in the universe, and cr p * is the weighted mean 
square of dhe deviations in the strata. 

If we took the numbers at random through the universe the 

PO 

standard deviation of the error would be c- 0 , where <r Q 2 = — . 

ti 

Hence c r* = <r 0 2 — ~ p - , and by choosing proportionally from the 

ti 

various strata the standard deviation of the error involved is 
diminished. 


In the investigation of the economic conditions of 4 towns 
{livelihood and Poverty), instead of numbering all the houses 
and selecting 1 in 20 at random, we marked one out of every 20 
throughout each street. By this means we secured that no 
district was completely unrepresented, which may possibly 
happen in a random selection, and we also got the advantage 
indicated by the formula just given, since social conditions in 
a street have a certain similarity. Suppose that there were 
16,000 houses in 10 equal wards, and that in these wards the 
proportions below some assigned standard were *02, *06, -io 
. . . -38. Then N =* 16000 ; = « a = . . . =» 1600 ; p x =» *02, 

p t = -06 . . . P = -2 ; d x = — -18, d 2 = — • 14 ... ; 

(Tp* = T V (*i8 2 4- *I4 2 +•••)> &p =* * 115 * 

Now suppose 80 houses were examined in each ward, 

n = 800, k = cr2 = ~ 2 gQ Q ' ^ — loo^ c *** ’ 0I 3 ^ anc * ^e resu ^ 

may be written *20 ± -0136, or 20 ± 1*36 per cent. 

In a non-stratified selection we should have had <r = *0141. 
The gain in precision is very slight, but the method of selection 
by strata is in accordance with common sense and should be 
used where it is applicable. 
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% 

B . — Case of a Universe Containing a Number of 
Measurable Objects. • 


i. Let there be N objects in the universe, the average of whose 
measurements is x and standard deviation s. 

n are selected at random and their average is found to be x v 

Then the standard deviation, <r, of x. ~ x is s */£_ L, by 

1 V w N J 

formula (52). 


Example . — The average number of persons per tenement* in a 
town of 10,000 tenements is 4-5, with standard deviation 2. 

In 1000 working-class tenements the average is 4*7. 

Here N = 10,000, n = 1000, x ~ 4-5, x x — 47, s = 2, 


<r 


2 


J 


I 

IOOO 


I 

10000 


•06, 



“ 3 * 3 ’ 


2. The universe is only known by a sample, n, x, s. A sub- 
sample of n x gives x v c r x . 

Let n a , x t , c r t be the residue, which, if the first sample were 
random and not of a class with a special average, would also be an 
independent random sample from the unknown universe. 

Then n x + n t = n, n x x x + n^c % = nx. 


x — x* 


n n % n x 

Also moments about the origin give 

n (s* + x*) = n x (o-j* + * x a ) + «* (<r t * + x t *). 

Standard deviation for x x ~ x % is * 

But the ratio of x ~ x x to x x ~ x t is constant and equal to 
.\ standard deviation for x ~ x x is <r where 

+ !±{x-xp. 

n* \n x ^ n t J n x ^ n nn f v 17 

as can be shown by eliminating cr t and x r 

Let n x be less than n |( and n and n % great. 

Then (x — x x ) V n x is of order <r V n v i.e. of <r v Hence the term 

(x — is negligible. 

See Biometrika, Vol. V., p. 182. 
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This method is used in the Scotch Census (Cd. 7163, p. 288) 
for comparing the size of families of men in different occupations. 

.If *— is small a — as it should from Case 1 when 
n V«! 

ft 

^ is neglected, and the observed <r x is taken for the unknown s. 

If cr t = s, as will be the case if the standard deviation is 
not affected by class, but only the average affected, 

a = <r l/y J { ^ ^ as was to be expected. 

Example from the Scotch Census, 

n, total number of marriages, = 133,960. 

x , average number of children per marriage, =** 5-82, with 
s = 3-099. 

Among boiler-makers, n x = 923, x t = 6-oo, <r x *= 3*039. 


<r = 



9-60 — 18-46) 
I 339 60 f 


= -io. 



\r8 

•10 


= i-8, which is barely significant. 


Example . — Among the flour prices tabulated on p. 311 for U.S.A., 
142 came from North Atlantic States. 


U.S.A. . . 

North Atlantic 


Number. Average. Standard Deviation. 
n = 267 x = 2-625 s = -293 

n x — 142 x x = 2-748 crj =* -244 




0595 , -0 858 — -1190 \ _ 


142 


*7 — ;= ,oi 7 - 


— = 7 approx., and the price in the North Atlantic 

States was definitely higher than the average for the whole country. 

3. Two samples ( n ix^cr^j and 3X6 taken out of two 

known universes (N^'sJ and (N J x /, s l ). 

<r for x t rw x t is then 


7 {*■■ Cr, - r .) + - i)} - J (2f + ’-£) 


since we have the difference between two independent observations, 
each coming under Case 1. 
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If the universes are only known from samples, and ^ ^ are 


small we must take 


Nj N, 




( 32 ) 


There are some other variants, which can be treated on the 
same principles. 

Example . — In food expenditures, similar to those tabulated on 
p. 310, in the whole group we have x = 10*3 (shillings), s = 3*3. 

x x for the families of 566 skilled workmen was 10*9, and x % for 
the families of 266 unskilled was 9*3. 

The standard deviations for these groups were not Calculated, 
but were probably nearly the same as s. 


cr for x x — x % is 3*3 \/ ( x J-y + rr X ) — *25. 


*l -*2 


O* 


i*6 

^5 


6 approx., and the difference is significant. 


The stratification of a universe of measurable objects is 
also treated by Mr. Yule ( Theory , p. 345). 

Let a universe (N xs) be composed of groups {n x x x s x ) f • • •> 

and let kn v kn 2 ... be selected from the groups, and the averages 
be found to be (x x + 8 1 ), (x 2 + S 2 ) . . . , and the average of the &N 
to be x + D ; £N — n. 

Then N = n x + n 2 + . . . ; Nx — n x x x + n 1 pc % -f • . . 

Write x x = x -f d t , x % = x -f . . . ; 
then S nd = o 

Ns* = n x (s,* + d^) + w, (s s 2 + d*) + . . . 

The squares of the standard deviations for S Xt S 2 . . . are 

c 2 p 2 

_L_ 

kn x ’ kn t '"‘ 

Write <r for the standard deviation of D. 


AN (i + D) = kn x (*, + 8,) + kn t (x t + S 2 ) + . . . 


by formula (55). 


D = J 1 S 1 +^8, + ... 

t _/ Wi ys^ /n 2 \*s 4 * 

W A« 1 + \N/ kn a + ” 


— Jfj. KV + "2 s ** + ••■)• 


Write <r 0 for the standard deviation of the average if the n 
samples had been taken at random from the univeiree as a whole. 
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Then <r 0 4 = ^ = — {n x (s, 2 + d*) + « a (s 2 » + d.*) + . . .} 

• • " ff « _ „ 2 _ * S M 2 ) 

•• ^ _<ro n' N * 

• 

• Write Ncr m a = S (nd 2 ) t so that <r m 2 is the weighted mean square 
of the deviations of the averages in the strata. 

Then - 2 = <V_ ~ (83) 

The precision of the average is improved by stratification, as in 
the previous case (formula (80)). 

Thus in the example on p. 313, let N be the total number 
of tenements in the last seven districts named (Spitalfields 
and onwards), x, the average number of persons per tenement 
in the districts combined, is found from the Census to be 
4*64, with s = 2*75. k = since one tenement in 50 was 
recorded in the selection ; N = 57000, and n — 1140. 




Number of 
tenements. 

Persons per 
tenement. 

Spitalfields . 


«l= 65 00 

+ •15^1 

Whitechapel 


*« = 59 

+ ‘o8~d f 

St. George . 


* a = 94 

+ *24=d. 

Shadwell 


n 4— 4 8 

— '27=d 4 

Limehouse . 


w 5 = 66 

— *io=d # 

Mile End, SAV. . 


w«=i34 

+ -o 7 =--d. 

„ N.E. . 

• • 

n 7 = 104 

570 

1806 

— -24=d, 

S [nd 2 ) = 1806 


= 0317 

57000 


„ ( 2 '75)*_ 

0 II4O 

•OO6634 

<r 0 *= *08145 


or* — *006634 ““ 

•0317 

II40 

=» *006606 <r 

=* *08128 


The improvement obtained by sampling in the seven strata 
represented by the districts is very slight. 


Existence of a Trend. 

Further applications of the same principles are made when 
we consider a time-series of observations and examine whether 
the fluctuations and movements are random or show the 
existence of a trend or of periodicity. 
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Tbk method, and its difficulties,- can be shown sufficiently 
by two examples. 


i.—' T he Recorded Times for “ The Oaks " from 1850 to 1899 are as Showm Below 0 



min. sec. 


min. sec. 


min. sec. 


min. sec. i 


min. sec. 

1850 

2 

56 

i860 

2 

56 

1870 

2 

52 

1880 

2 

49 

1890 

2 

40# 

1851 

2 

52 

1861 

2 

44 

1871 

2 

5 i 

1881 

2 

46 

1891 

2 

542 

1852 

3 

O 

1862 

2 

49 

1872 

2 

52 

1882 

2 

49 

1892 

2 

43 * 

1853 

2 

52 

1863 

2 

54 ' 

1873 

2 

5 ol 

1883 

2 

53 

1893 

2 

44 * 

1854 

3 

O 

1864 

2 

47 

1874 

2 

48 \ 

1884 

2 

49 

1894 

2 

50 

1855 

2 

58 

1865 

2 

5 i 

1875 

2 

49 * 

1885 

2 

. 43 f 

1895 

2 

48 ! 

1856 

3 

4 

1866 

2 

53 

1876 

2 

50 

1886 

2 

54 f 

1896 

2 

45 * 

1857 

2 

50 

1807 

2 

54 

1877 

2 

54 * 

1887 

2 

50} 

1897 

2 

45 

1858 

2 

53 * 

1868 

2 

47 * 

1878 

2 

54 

1888 

2 

42 * 

1898 

2 

45 ! 

1859 

Ten yearly 

2 

55 

1869 

2 

59 

1879 

3 

2 

1889 

2 

45 

1899 

2 

44 

average 

2 

56-05 


2 

51-45 


2 

52-395 


2 

48-22 


- 

46-26 


These figures fit fairly well a normal curve with average 
2 min. 50-87 secs, and standard deviation 5-20 secs. The 
standard deviation for the difference between two records is 
therefore 5-2 V2 = 7-4 secs. This is only exceeded eleven times 
between consecutive years, and no difference between consecu- 
tive years reaches twice this ; hence there is no proof of any 
sudden change having taken place between two races. The 
difference between some of the times for years early in the 
period and those later in some cases exceeds 20 seconds. The 
standard deviation for the difference between the averages for 

two periods of ten years is 5-2 +^) = 2-33 secs. The 

difference between the averages for 1850-9 and 189OT-9 is nearly 
10 seconds, and is significant, as is the difference between the 
averages for 1850-9 and 1880-9. The intermediate differences 
are hardly -significant. Hence we find that some cause was at 
work which gradually quickened the race between the fifties and 
the eighties. 


2. — The Marriage Rates for England and Wales from i860 to 1909 were 


i860 

17-1 

1870 

1 6- 1 

1880 

14-9 

1890 

15-5 

IQOO 

16-0 

l86l 

i 6-3 

1871 

16-7 

1881 


1891 

15*6 

1901 

15-9 

1862 

i6- 1 

1872 

17-4 

1882 

15-5 

1892 

15-4 

1902 

15*9 

I863 

16-8 

1873 

17-6 

1883 

15-5 

1893 

14-7 

1903 

15-7 

I864 

17-2 

1874 

17-0 

1884 

151 

1894 

150 

I9O4 

15-3 

1865 

175 

15*75 

16-7 

1885 

H*5 

1895 

150 

1905 

15-3 

1866 

17*5 

I876 

16-5 

1886 

14-2 

1896 

15-7 

1906 

15-7 

I867 

16-5 

1877 

* 5-7 

1887 

M-4 

1897 

16-0 

1907 

15*9 

1868 

1 6- 1 

1878 

152 

1888 

*4*4 

1898 

16-2 

1908 

151 

1869 

15*9 

1879 

I 4'4 

1889 

150 

1899 

16-5 

1909 

14-7 

Ten yearly 



14-86 


1-3-56 

1 



average 

, 1670 


* 6-33 



i 

15-55 
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The average for 50 years is 15*80, and the standard devia- 
tion for the 50 records is -89, and, taken irrespective of order, 
the distribution is ne&rly normal. There are no sudden jumps 
from one year to the next. The standard deviation for the 
difference between two averages of ten years is *4, and hence 
the fall from 1870-9 to 1880-9 and the subsequent rise is 
significant. 

The first twenty-five years shows greater variation than the 
second twenty-five, and we can make a finer test. 


, 1860-1884. 

<r*»-8 94 o-vT— * 566. 
Average. 

1835-1909. 

<r«*6i3 

Average. 

1860-4 

1670 

1885-9 

14*50 

1865-9 

16*70 

I 890-4 

I 5-24 

1870-4 

1696 

1895-9 

15*88 

1875-9 

15*70 

I9OO-4 

15*76 

1880-4 

15*22 

I905-9 

15*34 


There is a significant fall from 1870-4 to 1885-9 and a 
significant rise from 1885-9 to 1895-9* 

The argument should be illustrated by a diagram, which 
will suggest to what periods the test should be applied. 


Periodicity . 

The general question of the existence of a period of a length 
not predetermined is a mathematical problem, that of 
harmonic analysis, and is not suitable for discussion here ; but 
we can test the influence of periodicity if the length of the 
period is given. 

Take the case of a given interval, say one year where the 
records are monthly, and consider whether the differences 
between, say, January and February are such as might occur 
in a random choice of observations irrespective of time. Suppose 
the records extend over t years, so that there are 12 x t in 
all, that their average is x and their standard deviation from 

the average <r = where * stands for any obser- 

vation. 

The standard deviation of the difference between two 
averages each of / records selected at random is 

• VC? +7}-'^ 


Z* 2 
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and thfe chance of exceeding this deviation is found from the 
table of normal probability (p. 271), if the records regarded 
as a group are nearly normal, or otherwise satisfy the Condi- 
tions of p. 299. If instead of taking random selections we 
compare the average of the t January records with that of the 
t February records and find that the difference exceeds twice! 


or three times <r 



and similarly for other months, then we 


have evidence that the quantities measured are affected by 
the time of year, unless the records of a month include some 
quite abnormal entry. 

It is not, however, easy in this method to include all the 
evidence. Thus if we take the 180 records of unemployment 
on p. 161 we find that the average is 4-269 and the standard 
deviation is 1-924. The standard deviation for the difference 


between two averages of 15 is therefore 1-924 


15 


— -70. This 


is exceeded, but not greatly, when we compare the averages for 
January or December with those for April, May, June or July, 
and there is no other difference which might not arise hi 
random selection. There is, however, cumulative evidence 
which can hardly be measured. Thus the averages fall from 
December through January, February, March (if we omit 
the abnormal entry in 1912), and April to May, and rise month 
by month from May to October. This suggests a wave motion 
which the method here suggested is incapable of measuring. 

Another method, also difficult to make precise, is to compare 
the numbers of falls and rises from an assigned month to the 


next. 

Jan. 

Feb. 

Mar. 

Apr. 

May 

June 

July 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 


to 

to 

to 

to 

to 

to 

to 

to 

to 

to 

to 

.to 


Feb. 

Mar. 

Apr. 

May. 

June. 

July. 

Auf. 

Sept. 

Oct. 

Nor. 

Dec. 

Jan. 

Falls 

. 12 

13 

IO 

IO 

54 

7 

2 

7 i 

si 

IO 

15 

IO 

Rises 

• 3 

2 

5 

5 

9 i 

8 

13 

74 

4 

5 

O 

4 


Thus in 12 years the February number was less than that 
for the preceding January, and in 3 years it was greater. 
Where the numbers are equal, $ is counted for each row. 
Now in 15 trials in each of which + and — are equally likely, 
the chance of obtaining 10 or more of like sign is about $, 
so that the movements March to April, April to May, October 
to November are not very improbable in a random selection. 
The chance of obtaining 12 or more of the same sign is only fa, 
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and the movements from January to February, February to 
March, July to August, November to December, would hardly 
occur? if there wen? no influence from the season. 

The conclusion seems to be that there is a cumulative 
decrease from November to March or April or May, and a 
Cumulative rise during the early summer. 

Another example gives more definite results. The records 
of the catch of haddocks are recorded (North Sea Fisheries 
Investigation , Granton) monthly for 18 years, the unit being 
1 cwt. per month per vessel. The average is 172 and the 
standard deviation of the 216 records is about 108. 

The standard deviation for the difference between the 
average of one month compared with the average of all is 

108 V(^ + ii6) = 26-5 approx., and for the difference between 

the averages of two months is 108 = 36, in both cases if 

the selections were random and there were no seasonal 
influence. 

• The averages recorded are : — 


January 

. 101 April . 

. 83 July 

247 October 

. 227 

February 

. 1 15 May 

. 145 August 

282 November 

. 181 

March . 

. 125 June . 

. 196 September . 

267 December 

. 101 




Year . 

. 172 


Here January, February, April and December are more 
than twice 27 below the average, March and May are not less 
than 27 below the average, each month from July to October 
is more than twice 27 above the average, June and November 
are within 27 of the average. The conclusion is definitely 
that the season July to October is better than the season 
December to April. 

Also the movements between consecutive months are more 
than 36 in the following cases : March to April, April to May, 
May to June, June to July, September to October, October to 
November, and November to December. April is clearly the 
worst month, but it is doubtful whether August is established 
as the best. 

From the original figures we have 


Number 

Jan. 

Feb. 

Mar. 

Apr. 

May 

June 

July 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

of 

to 

to 

to 

to 

to 

to 

to 

to 

to 

to 

to 

to 

Feb. 

Mar. 

Apr. 

May. 

June. 

July. 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

Jan. 

Falla 

• 5i 

lb 

16 

4 

4 

7 

4 

XI 

II 

II 

15 

8 

Risea 

. 12* 

8 

2 

M 

14 

1 1 

14 

7 

7 

7 

3 

9 
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H4re a number greater than n or less than 7 is likely to be 
significant. 


Notes. 


i. The standard deviation of an average is often given as 


— t instead 
Vn — 1 


of as on p. 289, on the ground that we should distinguish between the 
Vn 

deviation from the unknown true average and that from the average of 
observations. 

Let x 9 be the true average of a group whose standard deviation is <r 9 , and 
let n things be selected from it which give an average H, and which separately 
are X v X, . . . with standard deviation <r. 

Write x = x 0 -f d. 

The deviations of X x , X, . . . from the true average are Xj — ir t , X # — x t . . . , 
and the standard deviation of these is by hypothesis <r 0 . 

Hence <r 0 f -= Mean (X — x 0 ) % = Mean (X — x-\-d)*= Mean (X — *)*-+- 

<T 1 

— = <r a -f- ~ , since from formula (38) the standard deviation of the 
average is <rjV n 



and 


jr 0 <r 

Vn Vn — 1 



S(X-g) 

« (n — 1) 


-} 


lienee the observed a should be divided by Vn— 1, not V n 7 
.The modification is only of theoretic importance, for the difference is only 
perceptible with quite small values of n, and <r is liable to an error of the 
same order as this difference in any case. 

2. When as on p. 159 and pp. 375, 387 we measure the deviation of an obser- 
vation in a time series from the average of t years of which it is the centre, we 
^ught to pay attention to the risk of error due to averaging, measured by 


<r/V t, where <r is the standard deviation of the observations in neighbouring 
years. The standard deviation of the difference between an observation and 

such an average is not <r but ,+z t) =a j Since t is small, the 


error is perceptible, and the deviations as shown on such a diagram as 
that facing p. 155 are imperfectly estimated, and the measurement of 
correlation on pp. 386-7 lacks precision. 



CHAPTER V. 

. EMPIRICAL FREQUENCY EQUATIONS. 

It cannot be assumed that frequency groups in general are 
expressible by the law of great numbers, for the particular 
complex of independent causes which leads to its equation 
cannot be postulated for observational groups in general. The 
main use of the normal curve is in its application to averages 
or other functions whose methods of generation are known. 
Its applicability to anthropometrical or biometrical groups 
must be verified for each class of measurements, and the 
question whether mental and moral characteristics are normally 
distributed needs special investigation. There is, however, a 
presumption that in very many classes the normal distribu- 
tion represents fairly the central portion of a group (from the 
centre to once or twice the standard deviation) and that the 
chance of an observation differing from the average by more 
than twice the standard deviation is not large, and conse- 
quently the table of normal frequency affords some guidance 
even in non-normal cases. 

For complete description of groups either a more elastic 
system is needed to include wider classes than are covered by 
the curve of error, or equations on an empirical basis should 
be found to fit special classes of observations. In this chapter 
we deal very briefly with equations that serve one or other of 
these purposes. 

The general method is to select a mathematical equation 
involving 2, 3 or 4 unknown constants, the constants being so 
chosen as to make the curve represented by the equation fit 
the diagram formed from the observations ; the number of 
points on the diagram being more numerous than the number 
of constants, ,we obtain more equations than unknowns and 
the best solution has to be chosen. A usual way of meeting 

343 
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such a difficulty is by the method of least squares (p. 452), 
but with observational frequency curves Professor Pearson's 
method is generally used, equating the mordents deduced mathe- 
matically from the equation of the curve to the moments 
obtained (as in Chap. I, p. 253) from the observations, This 
method has already been used (p. 305) when the average* 
standard deviation, and skewness (x, a, k) have been obtained 
from the first three moments of the observations, and it is 
always used in the system described in the next paragraph. 
Other methods are to obtain those constants which satisfy 
the condition that the observations would be found in a random 
sample with minimum improbability or to select a small number 
of chosen points at which the equations shall be exactly 
satisfied. 


Professor Karl Pearson’s System . 


It is necessary to call attention to the system of curves 
introduced by Professor Karl Pearson, since the notation 
involved has become general in statistical investigations, and 
it is advisable to indicate their relationship to the present 
treatment. For a detailed treatment, however, the reader is 
referred to Mr. Elderton's book, Frequency Curves and Correla- 
tion , and Mr. Hardy's Theory of the Construction of Tables of 
Mortality. 

/V • • • Ft • • • are used to denote the successive moments 
of a frequency curve. /z 0 , the area, is taken as unity. /x x is zero, 
if the curve is referred to the ordinate through the centre of 
gravity of the curve. When this is the case, <r, the standard 

,, 2 

deviation, is defined as y/ff is written for and /9 a for 




ht 


2 * 


The equation 


D xy 


{x + a)y 


b 0 + \x + b 2 x 2 


. . (84) 


is the basis of the analysis. 

This satisfies the condition that the curve should touch the 
axis when y = 0 and also be horizontal at one other position, 
namely when x = —a. That is, the curve has one mode. 
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It is found in practice, that it is useless to continue the 
denominator after the term b 2 x 2 . 

Tte integration df this equation leads to the three alterna- 
tive general forms : 




* a, 1 *Y“Y * 

■ y=H l+ j) 

y-y o(* 


and 


y~y-( i+ $) 

a) q 'x~ q \ 


where y 0 and the sets of three constants ( v , a v a 2 ), (m, a, v), 
(a, q^ y q^y are determinable by means of moments from 
a, b 0 , b v *b 2 in the basic equation. Mr. Hardy gives an alter- 
native method of analysis based on an apparently simpler 
notation. 

When there are special values of a, b 0 , b v b 2 or special rela- 
tions between them, simpler equations involving only two 
constants, or even only one, are obtained. In all, seven 
principal types are distinguished, and Mr. Elderton shows how 
each can be fitted to appropriate observational frequency 
groups. The algebra and the arithmetic involved are some- 
what heavy. The results of the application of the method to 
food expenditure are given on p. 310. 

The equation of the normal curve of error can be written 
in the form 

(85) 

and is one of the special types. 

The second approximation to the general curve of error 
gives 



where k * is neglected, and is also a special case. 

It has been found, especially by Professor K. Pearson and 
his co-workers, that unimodal observational frequency groups 
can very generally be represented adequately by one or other 
of the variants of the formula. Hence the calculation of the 
average, <r, and /9 2 from the observations forms a general 
and useful way of expressing a group by four intelligible 
quantities, carrying further the process by which an average 
is commonly taken as representing a group. 
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Fit>m these quantities the equation of the curve repre- 
senting the group can be deduced in its appropriate form, and 
then it is possible to interpolate values of 4 y for any value of x , 
whatever the grading of the observations may be. 

It is not proposed here to discuss how far these equations 
can be used in questions of probability, nor to consider how far 
the fundamental formula is empirical and how far it is depen- 
dent on hypotheses of chance generation. 

Professor Edgeworths Method. 

Professor Edgeworth has developed a formula based on a 
transformation of the normal curve of error which represents 
classes of cases whose skewness is too great to allow them to be 
included under the second approximation of the generalised 
law of error. It has not yet been tried sufficiently to decide 
how far it is useful for description, interpolation, or other 
purposes. (See Statistical Journal , two series of papers, com- 
mencing December, 1898, and July, 1916, respectively.) 


Professor Pareto's Equation. 


The equation D*y = — — , obtainable from the system 

x 

described above by taking the case where b 0 = o and 
b x 

b 2 = represents a curve which slopes downwards to 

the right for all possible values of x , when m is positive. 

In its integral form it is logy = — m log x + const., 
or y — Cx ~ m . 

The area of Jhe curve from x to 00 is 


z — 


Cx~ m dx — 


J X 





(m — i)* 1 


Write a for m — 1 and A for 


m — 1 


, and we have 


y = 


A a 

x a + 1 ’ 


z 


A 

* a 


(87) 


The last equation is the simplest form of "JPareto's Law " 
for incomes. Here A and a are constants and z is the aggregate 
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number of persons whose, incomes are at or above £x (or » 
francs, etc.). 

Otfier groups, c.g. the number of houses of various annual 
values, where the number of instances and the variable are 
capable of very wide ranges of values, and which are suitable 
for graphing on double logarithmic scales, are also found to 
conform to the same formula. 

In the case of incomes, the aggregate of incomes from £x t 
to £x t is 



Aa / I 
a — I V*,* ' 1 


I_ 


0 


The number of incomes in the range is A (~ ^). 

The law is not generally found applicable to very low or 
very high incomes. If it did extend to the maximum income, 
we should have 

Aa 

Aggregate income at or above £x 


Number of incomes at or above £x 
and hence 


(a — i)x*~ 19 

A XT 
, = N, say, 




Average income from £x upwards = - . x, 

and these equations would give A and a immediately from 
records of incomes. 

Pareto's equation fits the statistics of incomes of 1911-12 
paying super-tax very well over the range £5,000-^55,000 ; 
above the latter income it gives numbers in excess of the 
recorded income. 


a *= 1*5, log A = 9*618 are found to give a close fit. 


Range of Income* Number of Incomes, 

(ooo's). Calculated. Recorded. 


£5 to 

£10 




7.546 

7>4H 

10 ,, 

15 




1,890 

2,029 

15 » 

20 




790 

787 

20 „ 

25 




424 

438 

25 „ 

35 




411 

382 

35 » 

45 




199 

186 

45 - 

55 




103 

107 

55 

65 




70 

56 

65 „ 

75 




50 

37 

75 

100 




118 

55 

100 and 

over 




83 

66 

t Totals . 

. 

. 

. 

11,700 

IL554 


Aggregate of incomes : Calculated, £166,000,000 ; recorded, £145,000,000. 
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If a doubly-logarithmic diagram is drawn, the range over 
which a straight line is a good approximation can be seen, and 
a trial value of a is suggested by its gradient. This value may 
be tested by choosing two values for x, say x x and x 2 , which 
give values of N represented by points lying nearly on the 
empirical line, say N x and N 2 . 


Then 


a log N t — log N 2 
log “log*! 


If we take x x — 5,000, x 2 = 45,000 from the table just 
given, we have N x = 11,554 and N 2 = 321 ; hence a, ==,1-63. 

This method, however, assumes that the number up to 
the maximum income conforms to the law, which is not generally 
the case ; and in practice it is better to take three values 

*V X 2> *3- 

Then the equation 


num ber of incomes from x± to x 2 
e number of incomes from x x to x 3 


=k, a known quantity, 


is sufficient to give a. Suppose a = i-6 is the trial value ; 
calculate f(a) — k for a = 1-5, a = 1*55, a — i*6o, a — 1*65 in 
succession, and obtain by interpolation a value of a which 
makes f(a) = k as nearly as possible. Then test the resulting 
value against other parts of the record. Given a, A is easily 
found. 

Another method, which perhaps uses the data more com- 
pletely, is to use the equation, 

Average income in the range x x to x % 


= 

a — I Kxf ~ 1 x 2 a - v Vtfj* xf)' 

For various workings on the formula see House of Commons 
Committee on the Income Tax (H. of C. No. 365 of 1906, 
pp. 220-30, 240-1, 245-6). 


Makcharri $ Formula. 


The equation — ^ ^ = a + be* leads to a formula im- 
portant in actuarial work. 
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For convenience in integration write 


a =5 — log s, b = — log c x log g. 
•Then logjy = xlog s + c x . logg + const. 


• y — ks? . (g) cX , where k, s, g, c are constants (88) 

This is Makeham’s formula where y is the number of a 
given generation who survive to the age x , and is written /*. 

The ratio of the number of persons dying in an interval, 
Sx, to the number alive at the beginning of the interval, divided 
by the duration of the interval, is 


K ~~ Ix+tx dl x 

lx • $x lx dx 


(89) 


when the interval is indefinitely diminished. This expression 
is written and is called " the force of mortality/' 

The differential equation of the formula then gives 

fix = a + bd* (90) 

and the assumption is that the force of mortality is the sum of 
tjvo quantities one of which is a constant a, and the other 
bc x = fi' Xt say, is such that it increases in a constant geometrical 

progression, for — — - = log c. 

f 1 x 

A more complicated form, obtainable by writing a + a'x for 
a above, is used by Mr. Hardy (loc. cit. p. 88) and in the 
Report for 1912-13 on the Administration of the National 
Insurance Act , Part I., p. 585 (Cd. 6907). 

He also uses a hyperbolic equation for graduation. 


Io sr=: 


y 


k + 


m 

a + x 


+ 


n 

r+v 


where y is the number of husbands below age x and N is the 
total number of husbands ( Construction of Mortality Tables , 
pp. 50-1 and Cd. 6907, p. 595). 



CHAPTER VI. 


THEORY OF CORRELATION . 

Introductory. 

One of the principal classes of problems in statistics is to 
determine whether phenomena are independent of each other, 
and if not, to measure their dependence. 

In this chapter we consider principally the problem as it 
arises in connection with two or three variable quantities, the 
causes of whose variation may have something in common. 

Suppose that we have pairs of observations, e.g. the height 
and span of a man, the heights of pairs of brothers, the income 
and rent of a household. Let the pairs of measurements be 
(X v Yj) (Xj, Y t ), etc., and let there be a frequency group of 
the X's and another of the Y’s, with averages x, y. Then if 
X and Y are completely independent, when we are told a 
value of X, we shall have no knowledge about the magnitude 
of the corresponding value of Y ; the chance that it shall have 
any particular deviation from y is simply that given by its 
own frequency curve ; but if there is anything common to 
X and Y in the causes of their variations, the statement of 
the value of an X will presumably affect the probability of the 
deviations of the corresponding Y. 

X and Y may of course be connected rigidly by an equation, 
as, for example, X lbs. and Y kilos, may be different ways of 
expressing the weight of the same body, so that X = 2*204Y, 
and Y/X is constant. In the cases with which we have to 
deal, however, the connection is not one of direct relation ; 
when X is given, Y is not determinate, but in a series of 
measurements (e.g. of height) we shall find for the same X 
varying values of Y. 

If the average or shape of the frequency curve of the Y's 
associated with a given X is not the same as that for all values 

350 
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of Y when the sorting by values of X is not made, then*there 
is something common to the two quantities, and they are said 
to be correlated. 

*An obvious first method of analysis is to arrange the 
observed values of Y in “ arrays,” each array containing those 
values for which the X is the same, as in an ordinary cross 
table. The average Y t of an array of Y’s when X = X t 
would, if there were complete independence and the number 
of observations were great, tend to equal y, the average of 
all the Y’s ; if it differs from y by more than would be expected 
in random sampling, then there is an indication that the value 
of Y is not independent of that of X. 


Y 

N 

• 

X 

0 

0 

R 


X > 

O 

O 

X 

x 

0 

X 

Q 

X* 0° 

X 0 

X ° 

0 

x X 0 

0 

Y* 

M X 

! 


In the figure let O represent the averages of the X's and of 
the Y’s. Let x u y t , be the excess of X t , Y t over their averages, 
x, y, and let OM be a selected x t and MQ be the average of the 
ys in that array, so that MQ = Y t —y \ anJ let the marks 
xx.. indicate various positions of Q. 

Then if Y is independent of X, x x x will lie away from XX' 
only if the observations are not sufficiently numerous to give 
the true averages. If Y is not independent of X, Q will tend 
to have a definite locus, which a free-hand line drawn through 
its various positions will approximately define. If this is 
the case we can write Y< — y =/(X ( — x), so that when X ( 
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is given, though the actual value of Y ( is not known, yet Y t , 
the average of the array found in repeated selections, is approxi- 
mately determinate. 

Similarly, if we take an array of X’s corresponding to a 
selected value of y t (ON), the averages of these arrays, such as 
R, marked o o o, tend to lie on a curve X t — x — f x (Y t — y), 
where f x is not the same as /. 

The locus of Q is called the curve of “ regression ” of 
Y on X, and that of R the curve of regression of X on Y. 

It frequently happens that these curves are approximately 
rectilinear, especially in the neighbourhood of O, so that 
Y t —y — kx t approximately, where A is a constant when x, 
is small. 

The gradient of this line, k, equals approximately, for 
any small value of OM, and may presumably be found by some 
method of averaging the various values of We return 

to this on p. 355 and p. 364. 

We can approach the problem from a different aspect as 
follows. 

Let there be n magnitudes X v X 4 . . . , and n magnitudes 
Yi, Y 2 . . . 

Select at random an X, and independently select a Y, and 
form the product XY. Then in the long run, when a particular 
X happens to be selected, the various values of Y will come 
with equal frequency, and in the long run each of the n 2 
products X^, X x Yj . . . X 2 Y x . . . X„Y n will occur with 
the same frequency. 

The sum of a very great number, N, of the products 
S(XY) = S(x + x)(y + y) = N xy -f- x . Sy +ySx + Sxy. 

N 

Here Sy tends to be —(y 1 + y 2 + . . . + y„) = 0, and Sx also 

71 

tends to 0. 

N 

S(*y) tends to be - 2 (x lVl +% 1 y J + . . . +x 2 y 1 + ... +*„y n ) 

N _ c 
= 7^2 • s * • Sy = 0. 

Hence S(XY) tends to Nyy, and the mean of the product 
XY tends to equal the product of the means of X and of Y. 
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But if the selection of Y is not independent of that* of X, 
the w 2 products x 1 y 1 etc. do not come with equal frequency, 
and mean XY = xy *+ Mean xy . 


Again if in is the unweighted average of M x M 2 . . . , where 
in the notation of pp. 319, 320 M t = m + m t) and m w is the 
weighted average, where W* the weight for M*, == w + w t , w 
being the average of the weights. 




nw 


= s(^ + ^)_(m+ _^ ) = (M - - + S ) 

nw 

_ / , 1 c Wt 

\ n w m / 


(9i) 


rn w = in only if S w t m t = o, and this will only be the case if 
n is large, and if on the whole a large weight is not more often 
found with a large than with a small quantity, or vice versa . 


• On p. 288 it was shown that with the notation above, the 
mean of (x + y ) 2 was <r x 2 + a y 2 , where a x and cr y are the 
standard deviations of X and Y, if the selections of X and 
Y are quite independent. 

It is easy to see that when there is dependence the analysis 
is modified and leads to 

s* = Mean (x + y ) 2 = <r x 2 + <r y 2 + 2 . Mean xy . . (92) 


The Coefficient of Correlation. 

Hence it appears that the quantity Mean xy enters into 
many expressions when X and Y are not independent and 
that in itself it gives an indication of the existence and amount 
of correlation. However, its magnitude depends on the 
units used in measuring x and y so that there is no natural 
scale for it, and consequently a quantity defined as follows is 
used in preference to it. 

If X 1 Y 1# X 2 Y a , . . . X ( Y t . . . X n Y n are pairs of measurements, 
and the averages and standard deviations of the X's and Y's are 

A A* 
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x, <r x , !?, <r v , then the coefficient of corrdation between X, Y is written 
r* y , where 

- S{(X t - X) (Y, ->)} _ i ^ /Xt yj\ 

* tlffz&y ft \or» ( Ty) 

(taking X t = x + x t , Y t = y + y t ) e 
= — {S (X*Y,) - *SY t - yS X t + nxy} 

til T %(T y 

= ( 93 ) 

since SYe = ny } S Xt = nx. 

In the examples just given 

Mean XY = xy + rxy<rx<ry 

% = + r mt *5\ -*) 

\ m w ) 

S l — <T X 2 + (Ty* + 2 r xy (T x (Ty. 

Write r for Yxy. It can readily be shown that r is never >i 
or < — i. 

For w¥a* 2 (ry a = (S**y*) a ; 

but nVV - (S*<y*) 2 ° 

= (V+V+ • • •) (>'i a +> ; 2 a + • • 0-(^i3'i+^2>'a+ . . .}* 

since nir x 2 = * 2 2 + * a a + . . ., and no-y 2 =y 1 a + y 2 2 + . . ., 

- i*i y% - ^^i) a 

+(^a-^ayi) a + • • • +(^3-^aya) a + * • • 4-(*«y«- 1 ~*n- 1 yn) , 
which is > o, unless x t y 2 — x 2 y 1 = o = x^ 8 — x^y x ==••., and 


, X% *3 x n 4-°^* 

yi~~y»~y» y~n~ ~ V 

in which case the expression = o, and r = ± x. 

/. r* and i>r> — i, unless v varies directly as x, 

ft &X <Ty 

and then r = + i or — i (94) 

Hence r is a quantity which depends on all the observa- 
tions, is zero when independence is complete and Mean xy — o, 
is independent of the units in which X and Y are measured, 
increases whenever a positive x t is found with a positive y t 
or a negative x t with a negative y t> but only reaches the value 
+ 1 (which it can never exceed) when x and y are connected 
rigidly by the equation y =* x x constant. If positive x’s 
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are found with negative y * s and vice versa, r varies from o 
to — i. 

% r fs therefore a sensitive measurement of the amount of 
correlation. 

• To distinguish it from other measurements it is sometimes 
called the sum-product coefficient of correlation. 

If the pairs are grouped in arrays, such as x, t y s ; x, *y„ . . . 
and f, is the average of the n t quantities x y 9t 2 y, . . . , then 

S xy = S*, . tttfg, and ^.r==Q .^Y where S n 8 x 8 2 = na x * 

^ r is therefore a weighted average of the ratios on p.352, 

and y = r . x is an approximation to the locus of Q. 

On the following pages we examine the circumstances 
which give r various numerical values, study the distribution 
of X,Y on various hypotheses, and find the equations of the 
lines of regression. 

Nature of r . 

Let X and Y be two variable quantities which depend on other 
variables U, V, W in such a way that 

X* == jU* + a U* ■+•...+ pU* + jV* + 2 Vt + • • • + q Vi. 

Yf = X U, + + . . . + pUt + t \Vt + t W, + . . . + ff Wf, 

where *11* is selected at random from a frequency group of any 
form whose mean is x u and standard deviation jo*, 2 U* is selected 
independently from another group, and so on throughout the U's, 
V's and W's. p and q are any integers. 

Write = x a + i Ut etc., and X« == x + x t , Y t =y + yt where 
x and y are the means of all possible values of X and Y. Let <r*, 
o-y be the standard deviations of X and Y. 

Then x% = x Ut + + • • • + p u t + \ v t + + • • • + qVt, 

yt = i^t + i^t + • • • + p u t + 1 Wt + + • • • + qtt't \ 

<r x * = jfTu* + . . . + p(T u 2 + 1 cr 0 2 + . . . + q<r v 2 , 

and c rj* = jCTu 2 + . . . + pOu 2 + i<ru > 2 + . . . + q(Tu> 2 , 

since, by hypothesis, the u’s, v's, and w’s are all independent of 

each other (p. 288). 

If jCTu— jO'u— • • . “CTu, j(r®= 2<Tt)=. . .~CTv, and jCT W = 2 <Tw ~ • • • — 

or if cr u 2 , <r® 2 , o-u; 2 are mean values of the U, V, and W standard 
deviations squared, then 

<r* 2 = Por u 2 + <r y 2 = £<r u 2 + ^<ru> 2 . 


A A* 2 
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Also, mean x%yt = mean x u % 1 + mean t ut 2 + . . . 

+ mean x tk . x wt + . . . + mean x vt !#* + ... 

= P<TU*, 

for, since, by hypothesis, the selections of the various U's, V's and 
W's are independent of each other, such a term as mean x u% . x wt ii 
zero in the long run. 

Hence 

r = mean x t y t /* x . = ^ {{prr j + ' (95) 


and, in particular, if tr u = <r* = <r w , 


P+<1 


(96) 


This is the simplest conception of the numerical value of 
r ; expressed in words it shows that the correlation coefficient 
tends to be the ratio of the number of causes common in the 
genesis of two variables to the whole number of independent 
causes on which each depends. 


If constants a, b, c , d are introduced so that c 

x t = a x . X ut + . . . + ap . P ut + 61 . x v t + • • • + V q v t\ ( n 
y t = c x . x u t + . . . + Cp . p ut + d x . x w t + . . . + d q . q w t J # ' 97 ' 

then or as* =3 S . a 2 <r u 2 4" S . b 2 o-v 2 , cr y 2 = S . C a <r M a + S . d 2 <Tu>\ 

mean x t y t = S acvu 2 , 

and the expression for r can be readily written down. 


The Correlation Surface . 


Consider the case where the frequency curves of the U's, 
V's and W's are normal, and in the first place let us examine 
the grouping of X,Y in the simple case where 


X t = U t + v t , y t = U t + w t . 

The chance of the concurrence of selections u u 

x — x —y— e 

' <r tt V 27 r <r f V 27 r <r*V 27 r 

Eliminate v t and w % . 


v t , w t is 
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The chance of obtaining x, (to x t + &x), y, (to y t + 5y), when 
a particular value u, (to u, -f Su) has been selected from the 
U group, is * 

,/V, - *«>» , - y f )* \ 

* L, — u e l'« '•* i.SxSy&u 

0u<r v <ru>( 

I - - lx t - m *t } a - i<^, B + 2 hz# t + by*) 

= 7 vs • * . 8* Sy 6U 

**<vM 27r ); 

where A = — . + + ~^o, A/ = -^=, Aw = 


<r u 2 <t„ 2 o-„ 2 ’ 




a = ^ _ A/*, 6 = — 2 - Am*, h — — klm. 

tr * rr. * 


<*V 


The chance (say Pxv) of obtaining y f from any value 

of w is obtained by adding the chances of obtaining them 
from an assigned value of u. 

Hence, writing x, y for x t , yt, 


%(ax M + 2 hxy + by 9 ) y«> i - $*(«< - lx t - my t ) a 


fxy = C 

truOWt* . 2w 


•/ 


.V2* 


.du 


-$(aa ; 3 + 2 A:ry + fcy 2 ) 


<Tu(Tv<T\o2Tr^k 

NOW c r x 2 = ctu* -f- cr® 1 , Gy® = cru 1 4" cr “ ,2 » YVxPy — 0 m® 

b — 4~ O-P 1 ) (cru® + O'tg 1 ) —• CTu 4 __ <T x 2 CT y t ( I — r 2 ) 

cru 2 trr 2 crio 2 cru®crt? 2 <ru> 2 

« = A/ - AI* = J(A- A/) =/(A + — ,) = A .-^1 

Vgtm® CTicV &OV €Tu Z (Tvr 


<rz*( 1 — r 1 ) 

and similarly 

6 ~ try*(l — r*j 
i ,, , 1 


A kl . k k(T V 2 crw 2 cr x % <r y 2 {l — f 2 ) cr a cry ( I — r 2 ) 


• P 

• • *1 


_ 2 „ 2 ( 1 — r~) \<r x ~ a 9 a x a„ / ^ 

" z**4r,vr=* e ■ • • (98) 

By an extension of this method it is shown (Elderton, 
Frequency Curves, pp. 109 seq., following Pearson, Trans . 


1 / x 2 _ 2 rxy \ 

-r 3 )\cr x * + <r/ <r x a 9 ) 
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of Royul Society, vol. 187 (1896), A, 175) that if x t , y t are formed 
by the weighted sum of a number of variables, all of normal 
frequency, as expressed in the equations (97) above, is of 
the form *$-(«** + 2**1/ + ^*) as just found, and when the con- 
ditions that this surface shall have unit volume, standard 
deviations <r x , a v and mean product r<r x <r y are expressed by 
integration, the values of tc , a , h , 6 are the same as in the simple 
case discussed. 

Though this method is of considerable interest, and by it 
the measurement of correlation by the product-sum formula 
was introduced into modem statistics by Professor Pearson, 
its importance is greatly diminished by the assumption that 
the elemental frequency curves are normal. The following 
analysis is free from this assumption ; it is derived from 
Professor Edgeworth’s paper on The Law of Great Numbers, 
to which reference has already been made. 

Edgeworth' s Method. 

Let 

Xt^ 1 U t + i U t + . . .+ p U t . . .+ n U t , y t =iV t + t Vt+.. .+pV t .. . + n v't, 
where x u t is the deviation from its mean of a quantity selected 
from a curve of frequency whose standard deviation is x (r u , 
and 2 u t . . . n «<, 1 v t . . . n v t have similar meanings. Let 
the selection of the various u t ’s be quite independent of each 
other, so that mean x u t 2 u t etc. tend to zero, and let the v/s 
be similarly independent ; but let some, at any rate, of the 
v’s be not independent of the us, so that mean x u t • x v t , 
mean 2 u t • 2 v t , . . . mean n u t • n v t do not all tend to zero. Such 
a quantity as mean x u t • 2 v t is, however, to be taken as tending 
to zero. 

Let n be large and negligible, and the other conditions 

described above (p. 299) be satisfied, so that the curves of 
frequency of x and y taken separately are normal curves of error. 

Then <j* 2 = S (*<7 U 2 ), <r y 2 = S^o-* 2 ), 

and mean xy = S(mean v u t . p v t ), .... (99) 

where p is any integer from 1 to n. 

Now rotate the axis on which x and y are measured 
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through an angle 6 determined by tan 26 = w here 

r is the coefficient of correlation between x and y. 

* <*z<Ty 

Jhus we write 

• X* = %t cos 0 + y t sin 0, Y t = x t sin 0 —y t cos 
Similarly write 

„U t = p u t cos 0 + p v t sin 0 , p Y t = P u t . sin 0 — p v t cos 0.* 
Then X, = S P U,, and Y t = S P V,. 

Mean # X*Y* = sin 0 cos 0 (cr x a — <r y 2 ) — cos 20 . (mean *y) = o, 
from the value assigned to tan 20. 

o- x * = cos 1 0 . cr x 2 Jr sin 2 0 . o-,f + sin 2 0 . rcr x <r y — S (mean p Ut a ) 
o- y a = sin 2 0 . o- x 2 +cos 2 0 . cr y 2 — sin 20 . r<x x or y — S (mean p V t 2 ) 

+ CTy 2 = 0- Z 2 + CTy 2 

c r x 2 — CT-y 2 = (or x a — CTy 2 ) COS 20+2ro- x <r J/ sin 20 = (o - x 2 — <r y 2 ) sec 20 

• 4 <r x 2 cr Y 2 = 40 r x V y 2 (i — f 2 ) 

mean p Ut . P V* = sin 0 cos 0 (^otm 2 — p o-r 2 ) — cos 20 (mean p w* . p v t ). 

S (mean P U* . P V«) = mean X t Y t — 0 ( 100 ) 

Now follow the method of pp. 296-7 above for calculating 
the moments of X and Y. 

Let a, (3 be any small constants, whose use is to collect similar 
terms. 

e *X t + $Y t __ e a jU , + £ jVr ^ a TT r + 0 2 V, ^ 

Expand the exponentials, give / all possible values and 
take their mean, remembering that the mean of a sum is the 
sum of the means of its terms, and that the mean of a product 
of independent factors (as are the factors on the right-hand 
side) is the product of the means of the factors, and that the 
mean of first powers is zero. 

In the expansion of the left-hand side, the coefficient of 
a k p l occurs in mean (aX + ^Y ) A+ 1 ~ - (k 1 ) !, and equals mean 
(X*Y*) -f- (A ! I !), where A, l are any integers. 

* These meanir^s of X, Y, U, V are of course not connected with the use 
of the same letters on p. 355 above. 
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The* right-hand side * product of n factors 
{i -f- Ja* (mean j?U« a ) + J/2 2 (mean j>V* 2 ) + ap (mean + •••}» 

log (x + ... + |^j (mean X‘Y 0 + ...), ' • 

where k or l may be zero, 

= S [log( i + $a 2 (mean pU* 2 ) + (mean p V* 2 ) + a/3 (mean ,Uf p V t ) +,..}] 

= (expanding by the logarithmic series and adding terms) 

|i 1 S(meanj,Ut 1 ) + J/3 2 S(mean J> Vi t ) + a/3 S (mean P U* p V t ) +... 

= Ja* . <r x 2 + IP 2 . <r Y * + a/3 x o* + terms involving a*, a*/3, etc. 

By arguments similar to those on p. 297 it is found tha,t the 

terms in a* are of order -^= in comparison with terms in a*, 

and hence, when -^= is neglected, 

Vn 

a k B l i«V 2 pV* 

1 + • • • + m \ {mean xkYl) + •••=* * * 

- (1 + Ja*<rx*+ . . • +^(iaVx*)‘ . . .) (l + }y8*<r Y *+ . . . 

Equate coefficients of terms on the two sides of this 
equation. 

.We have (when 1= 0) mean X“ +1 =o, mean X u =^^j<r x u 

as in the normal curve, and similarly for Y when k — o. 

All means involving odd powers of X or Y are zero. 


Mean (X*Y«) 


• (101) 


These are precisely the mean powers found by integrating 
the surface 

,/ X 1 , Y*\ i X 1 Y* 

, - l * ’ i W ,+ <r7*J l , - icT* I , - K? 


27 Ta^tTj 


I I ■ 

<r x V27T <T t \^2TT 


where X is independent of Y. Hence, as on pp. 297-8, we 
may take this equation as giving the frequency of X, Y. 

It remains to transfer back to the original axes. 

As already shown <r x <r Y = <r*or y \/(i — r 2 ) 

X*<r T * + Y 2 <r x 2 

— ( x cos 6 + y sin ^)*<r Y # + (x sin 0 — y cos 6) a cr x 2 
= x 2 (cos 2 0<r Y % + sin* 0o- T % ) +y 2 {sin t 0<r Y l -{-cos 2 0(rj 2 ) — xysin 2 $(<r T *—<r Y *) 


From equation (100). 
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Then sum of the coefficients of x % and y % • 

= o- x * + <r T * = cr z 2 + <Ty 2 , 

aqd tHeir difference *= cos 20(cr Y 2 — <r x 2 ) = <r y 2 — cr* a , 
hence the coefficients are c t v % and o-* 2 . 

# Also sin 2 0 (<r x a — ■ < 7 Y 2 ) = tan 2 6 (o* a — <r y 2 ) = 2 ro-^y. 

X 2 Y a 1 

Hence: —5 + - t = -tZ T fr ^ + y^ x * - 2*^, . xy) 

CTjl ( Ty ( Tz (Ty (I — r ) 

and the equation of the surface is 


1 f r 3 y a 2rxy^ 

„ I 2(l-r^ 

2ircr*o-y\/ (i — f a ) * 


• (102) 


as already found (formula (98)) on the simpler hypothesis that the 
elemental groups have normal frequency. 

In this equation <r* a =S<r u a , a y *=Scr v 2 , ra x <r y =S . ( r v . p <r u . p (t v ), 
from equations (99), where r p is the correlation coefficient 
between p u and p v. 

Constants can be introduced in the original equations so 
that x = x a x u + a a 2 u -f , . . and y = x b x v + % b % v + . . • 
without affecting the method of analysis. 


Properties of the Normal Correlation Surface . 

The centre is at the average of the x and of the y variables. 

Volume = J j zdxdy 


=rJ f *Sjxdy = l 

— «. co*' - o* 


(103) 


2ir<7*o-y\/ 1 

Second moment in y = J J zy*dxdy 
=. 1 _^ P f *~*l 1 - r *>'**.y*.e~ i '?.dx’. dy, 

2 tT(T jtTy V I r 2 ' - - OO 

where x' = x — r— .y ; then integrating in respect of x* we find 
that the expression 


1 f* - i— , 

— / y a £ a *dy~<Ty 2 by formula (21). 

0 f/v2*r*' -•* 


TyV 2 

and similarly tlie second moment in * is or* 2 . 


(104) 
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Mean product of xy = f Jzxydxdy ± oo being the limits of 
integration, * 

r - i y * 

r< 7 y y ' e 7}d y = ^y ( I0 5) 

j j zyxtdxdy = 3r<r x *<r y , j j zx*y % dxdy = (2r a + i)o*Vy* . . . (106) 


The section by every plane parallel to XOZ, YOZ is a normal 
curve. E.g., if x = 

1 ( ** V 1 *1* 

* = — . r 2(1 - ( y " r '** k / . « _ ^ . . . (107) 

2ir <r*Oy V I — r* 

which is a normal curve with its centre at y — r — x v standard 

&X « 

deviation <r v V I — r*, and maximum ordinate 


2ir<T X <Ty'\/ 1 — r a 


The frequency group of the y’s corresponding to an assigned 
value of x is therefore normal, and its standard deviation is in- 
dependent of x. The average of the group (and its mode and 

median) is for all values of x on the line y = r^x =x tan <f> v 
(say), and this is the line of regression (p.352). 

r — is the coefficient of regression of y in relation to x. 

Similarly for a given value of y 1 the frequency group in 
x is normal, its standard deviation is (T x Vi--r 2 , and its 


average is on the line x = r —y, say y = x tan $ 2 . 

<Jy 

r — is the coefficient of regression of x in relation to y. 

<Jy 

The geometric mean of the two coefficients of regression 
is r. * 
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Horizontal sections are. similar ellipses. Thus if g 
2^- — 2f -22- 2 (I — r*) log (z^ira-xO-yV i — r 1 ) . . (108) 

O’# 1 Oy CTxCTy 

# The major axis of any such ellipse (where we take <r K ><r y ) 
makes the angle 0 with the axis of x f where 


Now 

If 


tan 2d = 4^ 


a , 


tan 2 <f> t = and tan 2<J> a = . 

dt 1, 0 = </>j = </> a , and the surface degrades into the 


plane . Otherwise, when [r| < 1, </> a > 9 > <p lt and the 

<r y or* 

lines of regression lie on either side of the plane of the major 

V X 7T 

axis of the ellipses, and on either side of — = — . If <r x — <t v , 0=— , 

* <?y (Tx 4 ' 

and the lines of regression are equally inclined to the planes 
containing the principal axes of the ellipses. 

It should be noticed that the surface is completely deter- 
mined by five quantities, viz., two averages, two standard 
deviations, and one correlation coefficient. 


Rectilinear Regression. 

We have found that under certain conditions, of a simple 
nature and dependent mainly on plurality of causation, the 
line of regression, that is the locus of the averages of one 
variable (y) for given values of the other (*), is straight and 
passes through the position representing the general averages. 

If the conditions are not rigidly, but only approximately, 
satisfied, there is a presumption that rectilinearity of regres- 
sion will be approximately attained. 

It may well happen that regression is still approximately 
rectilinear even if the variables x and y are not normally 

distributed, and that the equation y~r^x may still be 

the equation of regression, though the surface of distribution 
is no longer determinable from the value of r. 

Let there be n t values of y in the array corresponding to a value 

y$ 

x M of x, and let their average be y 9 . Write m s = - , so that m$ is 
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the gradient of the line of regression determined from one group 
only. 

Then it is shown above (p, 355) that tan <f> v *= r — , is a weighted 
average of Wj, m t . . . tn t . . ., the weights being ntx$ % &c. « 



Let ON = x$, NP# be any value of y found with x$, and NQ* = y% 
be the average of n$ such values. Let a line y = ax + b meet NP* 
at R*. 

Then Mr. Yule shows ( Statistical Journal , 1897, PP- 817-8) that 
the sum of all values of (R«P«)*, i.e., S[y« — ( ax$ + 6)]*, extended 

over all values of x 8 , is least when b = o and a — r . ^ . This 

O’* 

method depends on “ least squares/* ior which see Appendix, 
Note 10. 

This line y == x then passes through the observations in 

O'* 

such a way that the sum of the squares of the distances of the 
points representing the observations, measured from it parallel to 
the axis of y, is a minimum. This line is, then, whatever the 
distribution, a good single representation of regression. 


We can proceed a step further, if we assume that the 
dispersion of the ys in any s th array is independent of the 
value of x a , and is always <r 2 . For if the averages tend to lie 
on a straight line, and only fail to do so exactly because of the 
paucity of observations, then the deviation from the average, 


-—a 4 - 2 

namely R # Q„ has a curve of frequency Ke 2a * where a 2 =— *- 

t n » 

(p. 312)- 
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Hence the joint probability of deviations R 1 Q 1 , R,Q,‘ ... is 

- 5^3 s». 

K'e • , and* this is greatest when Sw,(R.Q«)* is least. 

Sn»(RiQ»)* = S [n,(y, - (ax,+ 6))*} 

* = S(««y, a ) + a 2 S(M«x« a ) 

+ N& 2 — 2aS (ntytXt) — 2 bSntyt+ 2 abSn$Xt, 
where N = • 

Here Sn^y# = o = S n$x$ since the origin is the double average. 

S(ni*# f ) = Sx 2 = No* 2 , S(n#$xt) = = Nrovr y 

• • 

and the expression equals 

S»*y# a + N (a(r x — ro* y ) 2 + N6 2 — Nr s a- y a . 

This is least when b = o, and a = r ~ as before. 


The line y^r^x is then the most probable locus of 

G x 

regression, if we assume rectilinearity and independence be- 
tween deviation in an array and the corresponding value of 
X, the deviations from rectilinearity being due to fewness 
of observations. 

The value of Sn,(R,Q,) a reckoned from this line is 
S n<y* - NrV v *. 

If, however, there is nothing in the genesis of the measure- 
ments, or in their results, to justify the assumption of recti- 
linearity, r ceases to be an intelligible measurement of the 
amount or degree of commonness of causation, though it may 
still be a useful function of the quantities in analysis. 


The Correlation Ratio. 

To obtain a measurement completely independent of 
assumptions about distribution of the observations. Professor 
Pearson has devised the correlation ratio (Drapers’ Company 
Research Memoirs, Biometric Series, II., 1905). 

Let xr y be the standard deviation of the s a array, so that 
n, . KTy* = S(y, — jp*) 1 , and write cr 0 * for the weighted mean of 
i®**. »®v* ... so that No-o* = S(n« . kt v % ) = SS (y, — y$)*, the inner 
summation be\ng extended over an array, and the outer indicating 
the sum over ail the arrays. 
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Write N<r m * = S {njf, 1 ), so that <j m * is the weighted mean 
square of the averages of the arrays. 

Then Ntr,* = S(y*), the summation being extended over All 
values of y, and 

N<r„* = S{S (y. 2 ) - 2 y,Sy,+n,y, 2 } = S (Sy,*-n,y,*) = S (y*)-S (n,y,*>. 

,\ ay* = cr m * + Vo*, as is otherwise evident. 

Now write 1 7 = ^ = (109) 

1 1 is then called the correlation ratio. It is the ratio of the 
scattering of the averages of the arrays to the scattering of the 
group not regimented into arrays. 

7 ) => o only if <r m = o, and therefore if every y, — 0 ; that 
is, if the average of every array is coincident with the general 
average of the group. 

r\ = 1 only if <r a = o, that is if every ,<r„ = o, and the terms 
in each array are concentrated at a single point, y,. 

Otherwise i>t]>o. 

In normal correlation every ,<r„ = <r y J{i — r 2 ) formula (107); 
and then a * 1 — a v 2 ( 1 — r 2 ), and 17 8 = r 2 . 

In other cases we have, as shown above, p. 365, 


Sn«(R»Q«)* = N<r m * - NrW = N( v * ■ 


* _ 1 Sn.(R»Q«) 8 

v ~ r + No-*'" 


r 2 W 


(no) 


and \r)\>\r\, unless every R»Q, iso and the means of the 

arrays all lie on the line y — r~x. 

We may now sum up the treatment of correlation so far. 
If ( x,y ) is a pair of measurements (from their averages) 
of two variables (related in space, in time, in a thing or in 
an organism), and if when x is given as positive (or negative) 
there is a presumption that y is positive (or negative), or a 
presumption that y is negative (or positive), then the variables 

are said to be correlated. In such a case - Sxy does not tend 

to zero when n is increased, but to a limit written as r<r*<r v . 
r =» o, *» 1, = — 1 have definite meanings ; r |s sensitive to 
all kinds of relationship between x and y. In general it 
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may be expected to be the greater as <r a (the mean scattering 
within the arrays) is less. If x and y are each the sum 
of p -|* q independen? elements of which p (only) are common 
to x and y, then r equals pl(p -f q), if the standard deviations 
of the elements are equal. If x and y are generated linearly 
from a multiplicity of independent causes (some of them 
common to x and y), then r defines the whole frequency 
distribution of the pairs, the regression loci are rectilinear, 

and their equations are y = r . x, and x — r — . y. If the 

<T X Gy 

normal frequency surface cannot be assumed, but regression 
is rectilinear, the same equation is a good empirical state- 
ment of regression. If nothing can be postulated as to the 
distribution of x and y or the averages of the arrays, the 
meaning of the numerical value of r is undefined (as is always 
the case with rj when it is not o or i). In general, however, 
r may be said to measure the amount that is common in the 
systems of causation of x and y. 


• Correlation between Ungraded Variables. 

The measurement of correlation by the methods so far 
discussed is only possible if we have adequate detailed observa- 
tions. Cases of great interest arise when such detail is not 
forthcoming. 

Colour of Hair. 

Parent. 


Son. 

Light. 

Dark. 

Touts. 

Dark 

a 

b 


Light . 

c 

d 

** 

Totals 


m, 

N 


Suppose that sons and parents are separated according 
to the colour of their hair distinguished as light or dark ; and 
that of m x sons of light-haired parents c have light hair and 
the remaining a dark ; while of w 2 sons of dark-haired parents, 
d have light hair and the remaining b dark. Let a + b = n v 
c + d — n 2> and n x + n 2 *= N = m x + w 2 . 

Required to determine from these data whether there is 
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a relationship between hair-colour of sons and parents, and, 
if so, to measure it. 

If in such a case normal distribution of the variable (say, 
amount of pigment) and normal correlation can be assumed, 
the problem is determinate. For the ratios m, : N, and n t : It 
give (by inverse use of the table, p. 271) the abscissae on the 
scales of pigment which correspond to the division between light 
and dark ; for any given value of r the fraction of the correla- 
tion surface bounded by planes through these abscissae is 
known, auid -the equation of the fraction 6/N to this is, con- 
versely, an equation for r. 

The necessary analysis is given by Peauson (Phil. Trans. A, 
Vol. CXCV, pp. 1 seq.) and Elderton ( Frequency Curves, 
Chapter VII) and results in a troublesome equation for r 
which cam be solved approximately. 

If we have control of the data and can make both separa- 
tions at the median, a simple solution can be given. 

Suppose that intelligence in arithmetic and in algebra 
is normally distributed. Arrange a large class of N boys 
in order of intelligence (as known by marks or otherwise) in 
arithmetic ; now mark also their order in respect of algebra, 
and suppose that b are found above the median in both 
respects, c below in both, d above in arithmetic and below 
in algebra, and a above in algebra and below in arithmetic. 
It is not assumed that intelligence is measured, but only that 
an order can be assigned. 


Then a+b = — — c+d = a+ c = b+d, 
2 


a = d = (i — q) N, say, and b = c = (\ + q) N. 


It can be shown as follows that r = sin 2trq. 

Take the standard deviation on each scale to be unity. 

Let the required surface be . e~%\- ,i) (l * + *'* “ 2rxv) 


The principal axis of the surface then makes - with the axis 


of x. (i -f q) equals the volume in the doubly-positive 
quadrant bounded by the planes y — o, x — o, and these 
planes cut off each of the similar elliptic horizontal sections 
(1 + ?) °f their area. Take the ellipse x* +y* — 2rxy — 1. 
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Hence in the figure 

Elliptic area CPQ = (i + q) of area of ellipse. 

Let 0 be the eccentric angle of P. 

The ellipse referred to its principal axes is 

(1 — r) x 2 -f (1 + r)y 2 — 1. 

tan 6 major axis // 1 -f r\ 

, 7 r ~ minor axis “ V \i — rj* 

tan - 
4 

r = — cos 2 0. 

__ 2 area CPA 2 9 

* ^ area of ellipse ~ 2 tt* 

2ttQ — 20 — - 
2 

and sin 2-rrq — — cos 2 0 ^ r . 

E.g., if 40 % of the boys were found to be above the median 
in both 

i + q = *4, q = *15. ' = sin * = -Si. 

If q — o, r ^ o ; if q = f = 1. 

In the table relating to 83 boys given by Mr. W. Brown 
( Biomctrika , Vol. VII., p. 366), 11 boys are above the median 
in algebra, but below in arithmetic. Here q = J — \\ — • 12 , 
and r = *68. Mr. Brown using the complete order (and not 
merely the median) obtains -65, and using the marks obtains *79. 

, Ail need correction, given by Mr. Brown, for age and position 
in school. • 

B B* 


* 37 ° 


ELEMENTS OF STATISTICS 


If* normality of distribution of the two attributes cannot 
be postulated, the problem of measurement of the amount 
of correlation becomes indeterminate, and a number of methods 
have been tried. 


A ssociation . 

The expected number of dark-haired fathers with dark- 
haired sons, if there were no causal connection, would be 

X ^ x N = £, where out of N cases n t sons apd m 2 
parents were dark-haired. 

The notation of p. 367 being adopted, and a, 7, S being the 
number in the a, c, d compartments that would be the most 
probable in a chance arrangement, we have 

a + 6 = a + /9==w 1 , a-fc = a+ 7 =m 1 etc. 
and a — a = b — /3 — c — y = S — d = b rn ^~ 

_ b{a + b + c + d) — (b + d){a + b) _bc — ad __ N 
— ^ ~ n — bd -y- 


Then q is a measure of association, but no definite meaning 
can be given to it except in extreme cases. 

Instead of q, Mr. Yule takes Q = ^ (the “ coefficient 


of association ”) or a> = 


Vbc — V ad 


Vbc + Vad 


(the 


" coefficient of colli- 


gation ”) as measurements. (See Introduction to Theory oj 
Statistics , p. 37, and Statistical Journal , 1912, p. 593.) 
Q = o) = o, if be == ad , and q = 0, the case of no association ; 
and Q = <0 = 1 if a or d is zero, and — 1 if b or c is zero, 
which cases correspond to the maximum of association on 
this method. 

These coefficients have therefore definite meanings in 
extreme cases, but the meaning of ( e.g .) Q = $ can only be 
appreciated by the examination of numerous instances, and 
in the end it can hardly be affirmed that a greater Q means 
a greater amount of “ association," for no definite measurable* 
meaning has been given to the term “ association." 
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Contingency. 

If, # instead of trying to find the amount of association, we 
ask for evidence of its existence , that is whether the observa- 
tions could arise if the attributes were independent, we are 
on surer ground. 

If p : i — p is the ratio of dark to light-haired among sons 

n 

in general, then from the observations p — ^ is the best value 
we can assign. 

Hencf the chance that, if N sons were divided arbitrarily 
(e.g. according as their Christian names began with A to K 
or L to Z) into two groups containing respectively m 1 and m 2 
of them, a would be found in the first group is that discussed 
above (p. 282-4), and may be written 

(vk being n '-e lcclcd )' 

where x = a — pm 1 = a — = a — a = yN * 


• * / w,\ n. m * 

a — p{ 1 — p) m 1 (^i N / "~ N" ' N * N ’ 

The chance that so great a deviation, positive or negative, 
as (a — a) should occur is 


Notice that 
% 2 q* N* 


: / — e ** dz, where z — 

j * V 27 r <* 


t = g* Ng ^ .a N a ^ K + * t) (”H + 

cr 2 WjWgW^j y ’ n 1 w 2 w 1 w a 


= ? 2 N 8 


n i m i 


-^“ + — ) 
w 2 W! n 1 m 2 J 


_(*-«)* (6-/?)* (c-y) 2 (i-8) 2 _ V2 

. +— 8 -A. say. 


E.g. in the distribution 


65 I 235 


6 35 I 165 

Wj = 300, « 2 — 200, = 100, m 2 = 400, N = 500, 

a 2 ~ 19*2, a — 60, X == <?N = 5, 

:=fi 4. F (1-14) = -373 (p. 271), and 2{J - F (1-14)} = -254. 


• q has the samg meaning as in the previous paragraph and is not i — p 
as in Chapter II. 


B B* 2 
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The chance against obtaining 65 or more or 55 or less, 
when 100 are selected out of 500, and the chance in one selec- 
tion is 300 : 500, is 746 to *254 or about 3 to 1. 

Given n v m v N and a, the remaining numbers b, c , d, n 2 , m 2 
are known, and thg chance just found is equally the chance 
affecting any one of the numbers b, c t d taken independently 
of each other. It should not be spoken of as the chance of 
the distribution as a whole ; to find this we should need to 

» ft 

know the chance p from a wider universe, and not as deter- 
mined from a limited number of observations, and also the 
general chance to which is the approximation. 

To illustrate this difficulty, we will consider a problem 
that has often been discussed. 



Not 

vaccinatcJ. 

Vaccinated. 

Totals. 

Recovered . 

a 

b 


Died . 

c 

d \ 


Totals 

m, 


N 


In an epidemic of smallpox the number of cases is N, of 
whom m 2 were vaccinated, « 2 died, and other categories are 
as shown in the table. 

ft 

The recovery rate, as shown by the whole statistics, is 

and if vaccination (whether directly, or by the other attributes 
correlated with it) had nothing to do with recovery, then the 
chance of a vaccinated or an unvaccinated patient's recovery 

ft 

would be and the chance that as many as b recover out 
of m % vaccinated is 

r N* 

I Cl—, t nin ’ mim * . dX f 

J b - 0 v ( 27 rn 1 n 2 m 1 m 2 ) 

where <?N = b — /3 = x ; if this is small there is evidence of a 
relation between vaccination (or the circumstances that lead 
to it) and recovery. 

The rate however, is subject to a standard deviation of 
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and unless this is negligible its effect on the 


373 


corrtputed 


result ^should be tested. 

• A measure of the advantage (or disadvantage) of vaccina- 
tion (apart from evidence of the existence of some effect) 
cpuld conceivably be obtained by comparing the recovery 

rates — and — , but apart from the statement of these rates 

and their standard deviations there is no direct method of 


procedure. 

Tfce question of the existence and of the measurement of 
association becomes more complicated when, instead of simple 
alternatives in each attribute, we have several different classes, 
for example several grades of hair colour both in father and son. 

Professor Pearson has introduced the “ coefficient of 
contingency ” for the measurement of such a case (Drapers' 
Company Research Memoir, Biometric Series I, 1904; Elderton 
Chapter X). 

Numbers of Observation* 

Classes of Fiist Attribute. 


2 J 

<~ja 
°C 
« rj 

u 




, 

”1 



• • • 


*1 


. . . 

w » 

m l 

tn t 


! N 


Let n x ; n 2 . . ., tn v m 2 ... be the totals of lines and 
columns as in the table, and N the total number of observa- 
tions. 

Then if n x [. N, m x j N are assumed to be the accurate propor- 
tions of the first class in each attribute to the total, the most 
probable number to be found in the position of a x> if there was 

no association, would be a x = ^ x ^ x N. Similarly values 

aj . . ., @ v @ 2 . . ., y x , 7 2 • • • can computed for the other places. 

The divergences a x — a v a 2 — a 2 , ... b x — ft x ... afford 
some measure of association. Since an excess or defect is 
equally probable, it is convenient to take the squares, (a x — a x ) a 
etc., instead of the linear quantities. 

Analogy with the case of four categories suggests the 
formation of the function 


X 2 _ K — a f ) 2 I (fa 


a ,) 2 


+ ...+ 


(b x -ft ) 
ft 


(in) 
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Tkis function also has a place in the measurement of the 
appropriateness of a formula to represent given observations 
(formula (130)). ' • 

The coefficient of contingency is then defined as 


C = 


1 



c 

(112) 


When X = o and there is no association, C = o. 

As X increases, C increases from o, and tends towards i as 

^ becomes great. 1 * 

= - 1 ~~ — + ~ + . . . , and depends only on 

N n x mi n x m 2 r J 

ratios, not on the whole number measured. 

It can be shown (Elderton, p. 147) that if the numbers 
a v a* . . . etc. are those which occur in appropriate 
divisions of a normal correlation surface, and if the number of 
divisions is large, while N is so large that the smallest of the 
entries is not less than a small integer, then C approximates 
to r , the coefficient of correlation. This relation appears to 
have suggested the form of the function of X 2 which defines C. 

The value obtained for C differs according to the number 
of divisions taken, and this consideration diminishes its utility 
as a measurement ; but a method has been given by which this 
difficulty can be overcome ( Biometrika , Vol. IX, pp. 116-139). 

The significance of particular values of C can only be 
appreciated by experience of many cases. 

It should be noticed that C, and the analysis of the previous 
paragraph, can be applied to cases where classes can be defined, 
but have no measurable attributes. 


Correlation of Time Series . 

So far we have been concerned with the correlation between 
two statistical groups where the measurements all relate to 
the same time ; we have still to consider how to test the 
relation between two series, where the pairs x v y x . . . x pt y p . . . 
x t ,yt are measurements of quantities at successive intervals. 
Here it is generally the case that one value of x is not inde- 
pendent of those that come before or after it, and the relation- 
ship found between x and y may merely reflect a general or 
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periodic progress in time and not any more intimate connec- 
tion. Nearly all series in time have a trend, and these 
trends, whether equally rapid or not, or in the same direction 
or not, will yield a high correlation coefficient even if the 
quantities are otherwise independent. For example the 
coefficient between 

i, 2, 3 20 

ioo, 98, 96 62 

where x t = t and y t = 100 — 2 (t — 1), is — 1. 

We *need to find the correlation between the deviations 
after the time element is eliminated. 

One method * is to obtain smoothed lines for each quantity, 
as above (pp. 132 seq .), to compute the differences between 
the observations of each year and the values given by the 
smoothed line, and to treat these differences as the quantities 
whose correlation should be measured ; i.e. to measure the 
correlation between such quantities as those represented on 
the diagram (facing p. 155). 

• If the series are markedly periodic, the result would be only 
to bring out the correlation due to the periodicities, and these 
are better studied by harmonic analysis. And if the series 
are strongly “ compensated ” (p. 148), so that a positive devia- 
tion is generally followed by a negative one, the correlation 
would reflect this symptom. 

But if the oscillations are random, so that apart from a 
regular trend the measurement of one year is unrelated to 
the measurement of adjacent years, the coefficient, calculated 
between the deviations from the smoothed lines, measures the 
same kind of relation as that already discussed in the correla- 
tion of groups. 

Let x p , y p be a pair of measurements in the p th year, and 
let x p , y p be the averages of the measurements of m years of 
which the p th is central, m being odd. Then the correlation 
coefficient to be calculated is that between x p — x p and y p — y p . 

The method can also be applied if the smoothing is effected 
by the method recommended by Professor Persons (The Review 
of Economic Statistics , No. 1, 1919, Harvard University Press). 
In this method the average y is calculated for m years during 


• See Hooker, Statistical Journal , 1901, pp. 485 seq . 
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which* the trend appears to be in -one direction and without 
any marked change of gradient, and the smoothed line is 
assumed to be of the form y — y kt, where t is the nt*mb f er 
of years from the centre of the period, k is determined by 
the condition that the sum of the squares of the deviations 
from this line shall be a minimum, viz., that S{y t — (y + kt )f* 
is a minimum, where y t is the observation t years from the 

centre. Then k = ). 

If m = 2« + 1, St 2 — 2(i 2 + • . . + «*) = - — ^ J -. 

3 ' 

This method overcomes partially a difficulty that occurs 
when moving averages are used in a case where the trend is 
continually concave (or convex), and the averages always 
below (or above) the observations. 

Another method, introduced by Miss F. E. Cave (Royal 
Society’s Proceedings, Vol. LXXIV, p. 407, 1904) and by Mr. 
Hooker (Statistical Journal, 1905, pp. 696 seq.) and more 
recently developed by Professor Karl Pearson, Miss B. M. 
Cave and others, is to correlate not the observations but the 
differences between successive observations. A period of 
tn + 1 years is selected, in which the observations are 
x 0 ,x 1 . . . x m and y 0 ,y x . . . y m , and the coefficient of correla- 
tion between the pairs x x — x 0 , y 1 —y 0 , x a — x x , y a — y x . . . 
x m — x m - x , y m — y m - x is calculated. 

Since the average of the quantities x x — x 9 , x 2 — x x . . . 

is — (x m — x 0 ), the deviations from the average which alone 

fH 

enter into the formula for r are the excess (or defect) of the 
increment in a particular year over the mean increment, or 
they may be described as the annual variations from the trend. 
If the smoothed lines of x and y are markedly concave or 
convex the correlation will be dominated by this symptom, 
but if the observations oscillate in an irregular way about a 
straight line, we shall obtain a measure of correlation inde- 
pendent of the element of time. 

To get over the difficulty arising from concavity or con- 
vexity a more elaborate method, named by Professor Pearson 
that of “ variate difference correlation has been devised.* 


Btomeirika , Vol. X, pp. 179 seq. and pp. 340 seq . 
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This is based on the assumption that x, can be expressed as 
x p = Xp + bt p + ct p 2 + . . . , where X p is independent of the 
irtflueftce of time, and the effect of the time can be expressed 
as a parabolic function : and similarly y p = Y p + b't p + . . . 

• x p+i — x p = Xp+i — Xp + b + c (2 t p + 1) + . . . 

Mr. Hooker's method ignores c and further constants. 

The second difference gives 

%p+i — 2 Xp + %p-i ^ Xp+i — 2Xp + Xp.j +2 c + 6 dtp . . • 

and its use ignores d etc., assuming a strict parabolic form. 

It can be shown that when time is eliminated the correlation 
between any differences equals the correlation between X p and 
Y p . The process is complete when the correlation coefficient 
is no longer affected by proceeding to a further difference. 

There is, however, a great difficulty in applying this method 
to any differences except perhaps the first three, owing to 
the want of precision or the small number of significant figures 
in ordinary observations. The effect can be seen if we take 
the squares of the numbers 2-6, 2*7 . . . when written only 
to the first decimal place. 

6-8 

7-3 
7.8 
8.4 
9-o 
9*6 
10*2 
10 9 

The second differences if written completely are all *02. 
The method is, in fact, too refined for ordinary statistical 
observations. 

The difference between the methods may be exhibited as 
follows. 

The x quantity that is correlated, if we use the fourth 
difference, is 6{x 0 -f i(# 2 + *-2) — H x i + x -i)}> where the 
suffixes mark the distance to right or left of the centre ; 
here the extreme terms increase the expression. 

If we take the moving average based on five terms, the 
quantity is 

x o~i( x i+X-i+Xo+Xi+^^iXo-iiXi+X-J-Hxi+x-i)) 
and the extreme terms diminish the expression. 




'378 


ELEMENTS OF STATISTICS 


The eighth difference gives 

x o ~ t( x i + *-l) + f (*2 + x -t ) ~ s*t( x » *-*) + T*o(*« + x -t ) 
while the moving average gives <■ 

x o — i^i H* x -i) — ii x 2 *h x -t) — i( x a "I" x -») + *-*)• , 

On the other hand, the second difference gives 

*o “ i ( x i + * 0 + x -i)> 

and is the same as that obtained from a moving average based 
on only three terms, and therefore subject to a very considerable 
chance error. , , 

The various methods need further examination and more 
experience of their applicability. It appears that the moving 
average does not give the right importance to extreme terms, 
while the difference method is too sensitive to the effect of 
roughness in observations. In either case, the resulting 
measurement of correlation depends on the assumptions 
made, and is not so easily intelligible as in the measurement 
of correlation of groups. 

Graphic Comparison of Series. 

Apart from the determination of a measurement of correla- 
tion, the problem arises of how best to exhibit the relationship 
graphically. 

The following method is useful. Let x v x 2 . . . x„, y v y 2 . . . y„ 
be deviations from moving averages (as in the table on p. 387), 
or (if there is no trend) be actual measurements, and let f andj 
be their averages. 

Make a graph of the values of y on any convenient scale, 
time being measured horizontally. Then the x values may be 
placed on this diagram on any scale and with any origin. 

Take b as the origin for x, and let 1 unit of x correspond 
to c units of y. b and c are to be chosen. A convenient method 
is to select them so that the sum of the squares of the vertical 
distances between the points representing pairs such as x v y x 
shall be a minimum (Appendix, Note 10). 

That is S{c(* + b) —y} a is to be a minimum. 

By differentiating with regard to b and to c, it is found that 

Six — £) (y — y) 

c — i t an d c(x + b) — y, where <r x is the standard 

deviation of the *’s. 
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The averages of the deviations should therefore be fnarked 

at the same point on the vertical scale, and the differences 

* * r<r 

fvom their average of the x’s should be multiplied by — 3 and 

a i 

<then measured on the y scale above and below the average, <r 2 
being the standard deviation of the y’s and r the coefficient of 
correlation. 

An example is given in measuring unemployment in the 
Statistical Journal , 1912, pp. 799-800. 


Not# added in 1936. — Professor R. M. Fr 4 chet * has developed an 
interesting relationship between r and rj (the correlation ratio). 

With the notation of pp. 363-6, Nt ) 2 ^ 2 = S (ny t % ) 

N S(*y) = S (nj> t .x t ) 


r* = Y ) 2 X 


(SwJvO* 


or r 


7) X p. 


Here p is the coefficient of correlation when in each array all the 
observations are collected at the average, that is n, objects are at 
* b9 y„ etc., for No, 2 = S(* 2 ) = S(w^, 2 ). 

Thus r can be resolved into two factors : the correlation ratio and 
another correlation coefficient. “ Nous avons fait observer que ce 
# facteur p est celui qui depend seulement de la forme de la relation tandis 
que la rigneur de la dtpendance n’interirent que dans 7 ).” “ Deux 

facteurs l’un, p, qui ne depend que de la forme de la ligne des moyennes, 
l’autre, t\, qui n’en depend pas ou k peu pr£s pas." 

Notice that if tj = 1, r = p, while (as shown on p. 366 above) every 
= o. In such a case, when an x is given the value of y is com- 
pletely determined. 

If p ass 1, the line of means is perfectly rectilinear. 


Maximum Value of Ike Coefficient of Contingency. — If relationship 
is perfect, all the objects in a given class of the first attribute are found 
in one class of the second. Without loss of generality (in a table 
where there are l rows and / columns) we can write this condition as 
fli * n, =* m v b t = n % = m 2 . . while all other compartments are 
empty. 

Then a, *= n^/N, *= w 8 2 /N . . and o = a 2 = a 3 . . . ** pj — p 8 . . . 

Now X 2 =* a, 2 / aj — 2 a, -|- a, + similar terms 
- L^/oq) - 2N + N - La, 2 / a, - N 
«= n, 2 ~ n, 1 / N + n 8 2 ~ « a 2 /N -f to l terms — N = (/ — i)N. 

# \ C =■ 1 -r VU + i/(^ — 1)) = Vi 1 — i If), which tends to unity as l 
is increased, but is definitely less tnan unity for ordinary small values 
of l. 


* See Revue de l* Institut International de Statistique, 3 Annee, Livraison 4, 

pp. 3^5-79. especially pp. 366-7. 
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EXAMPLES OF CORRELATION . 


In this section the results of several experiments and 
observations are given to illustrate the theory discussed above, 
and to show the arithmetical working of the measurements. 

It should be premised that the theoretical value of r would 
only be obtained exactly in an infinite number of observations. 
It is shown in the chapter on probable errors that r, as calcu- 
lated from n pairs, may differ from its true value by an amount 
whose standard deviation measured on the normal scale of 
i — r® 

Thus in the first example the correlation 


error is 


Vn 


coefficient is known to be *6 ; 24 pairs are taken, and we should 


expect to be within - 


x - -6 2 


V24 


= *13 of *6, while it is very unlikely 


that the difference would amount to 3 times -13. Conversely, 
if we do not know the coefficient a priori, we must read with 

-T- 

our calculated value ± 


Vn 


Some of the examples are intended to show simply the 
arithmetical methods of working out r from the observations. 

In others when the observations are numerous the averages 
of arrays are obtained and comparison is made with the 


equation y — y = r — (x — x), which is the locus of these 

a i 

averages if regression is rectilinear. 

In the final example the distribution of 1,000 pairs is 
compared in detail with the distribution given by the theoretical 
correlation surface. 

In general x and y are measured not from their averages 
but from an arbitrary origin and then r — (^~ — i/j 4 - a l a t , 
by formula (93). 
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Example i. — To obtain a simple illustration of correlation 
when all the circumstances were known and the coefficient 
could»be stated a prfori , digits were taken from a mathematical 
table at random. x t was taken as the sum of 5 digits, and y t 
also as the sum of 5 digits of which 3 were included in the 
5 which made x t and 2 were different, and 24 pairs ( x^y x ) . . . 
( x t y t ) • • • were formed. The correlation coefficient for such 
pairs is £ (formula (96)). In the example in which only 24 
pairs were taken it was *537 ; the standard deviation of the 

1 

coefficient £ is — 7=^- = -13, so that the deficit from so small 
* ♦ V24 

a number is not remarkable. 

The following table shows the working. 


X 

y 

X* 


22 

32 

484 

1,024 

27 

27 

729 

729 

12 

19 

144 

361 

21 

3° 

44 1 

900 

21 

26 

441 

676 

27 

26 

729 

676 

2 3 

25 

529 

625 

17 

22 

289 

484 

25 

23 

625 

529 

II 

9 

121 

81 

16 

24 

256 

576 

20 

28 

400 

784 

37 

29 

1.369 

841 

33 

25 

1,089 

625 

18 

20 

324 

400 

24 

26 

576 

676 

22 

17 

484 

289 

17 

16 

289 

256 

32 

27 

1,024 

729 

29 

29 

841 

841 

26 

17 

676 

289 

27 

20 

729 

400 

26 

26 

676 

676 

21 

17 

441 

289 

554 

560 

13,706 

13.756 


*y 

7°4 

729 

228 

630 

546 

702 

575 

374 

575 n = 24 * = 2 3l V y ~ 23J 

99 24V- 13706 - 24 (23^)* 

384 = 618 

560 a, = 536 

'g2 3 r - x 3354 ~ 24 *V 
360 2 4° r i° r t 

624 - ’537 

374 

272 

864 

841 

442 

540 

676 

357 


13 354 


The arithmetic is simpler if * and y are measured from an 
origin 23 in each case. 

Example 2. — Where we have few and sporadic observations, 
it is simpler to work out the arithmetic in full. For example 
the infantile mortality in 26 towns is in the adjoining table 
compared with the population (to the nearest 1,000) of these 
towns, r is only twice its standard deviation, and its exact 
value is the/efore uncertain, but there is evidence that the 
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larger .the towns the higher the mortality. To attack the 
question of the causes of infantile mortality seriously, it would 
of course be necessary to take many raAe instances ard to 
consider many other factors besides crude population. 


Population and Infantile Mortality in 26 Towns. 


Population. Mortality. 

x y xy 


55 

39 

36 

35 

3 i 

30 

27 

24 

24 

23 

22 

20 

19 

19 

19 

16 

15 

15 

15 

12 

7 

6 

6 

5 

5 

4 


162 

201 

241 

162 
179 
174 
176 
208 

163 

206 

172 

200 

218 

198 

132 


220 

141 

169 

155 

129 

167 

150 

171 

161 


8,910 

7.839 

8,676 

5,670 

5.549 

5.220 
4>75« 
4>992 
3,912 
4.738 
3-784 
4,000 
4.142 
3.762 
2,508 
2,480 

2.220 

3-3°° 

2,115 

2,028 

1,085 

774 

1,002 


750 

855 

644 


r 


S xy — nxy 
26 <r 1 <r t 


where 

n = 26 
x = 20346 
V = I75-3I 


“ 34 

Standard deviation of r 


1 - 34* 
V26 



Totals . 529 4,558 95,707 


Averages 20-346 175-31 — 

Also ^ — ia-2 a, -» 27-9 


Example 3. — A good illustrative example of method is 
obtained from statistics arising from the North Sea Fisheries 
Investigation of the size of herrings in relation to the rings 
which appear on their bodies and which are believed to show 
their age, one ring being formed each year. 

The averages of arrays lie very near the theoretic straight 
line of regression, in spite of the skewness of the original 
curves. 

In the table the size is measured on the axis of y, with 
origin at 31 cm. and unit 1 cm., and the number of rings is 
measured on the axis of x with origin at 7 rings and unit 1 
ring, and the numbers of cases are entered in a square table. 
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ti 2 is the total of cases for a given value of y, and n^y and 
n^y 2 and their sums are obtained in the last two columns, which 
lead Jto the average and standard deviations of the rings. 
Similarly n x is the total of cases in an x array, and the sums 
pf n x x and n x x 2 lead to the average and standard deviation of 
the sizes. 

In the last line the average in each array is given, obtained 
in each * array by multiplying the numbers of cases by the 
corresponding values of y. 

Underneath each number of cases is given in brackets the 
corresponding value of * x y ; thus in the column under 
x = —1 in the row y =» 3, we have four cases and xy =* —3, 
so that the contribution of these four cases to the sum of xy 
is 4 x — 3 = —12. The various terms thus contributed are 
shown below grouped in the four quadrants. 

The origins are so chosen as to include as many zero terms 
in S xy as possible. (Compare Yule, Theory of Statistics , p. 183.) 


Herring. Number of Rings and Size (Length in Centimetres). 


Number of 
ring* . 

• X 

4 

3 

5 

— 2 

6 

— 1 

7 

0 

8 

z 

9 

2 

10 

3 

11 

4 

12 

5 

13 

6 

Totals. 

”* 1 


n t y* 

Size. 

cm. y 














35 4 



1 

(-4) 


X 

(4) 

3 

(8) 

2 

(12) 

— 



6 

24 

96 

34 3 

“ 

(-6) 

4 

(-3) 

4 

15 

(3) 

14 

(6) 

7 

(9) 

3 

(12) 

(i5) 

(A) 

50 

x 5® 

430 

33 2 

(-6) 

1 

(-4) 

11 

(-2) 

26 

36 

(2) 

22 

(4) 

11 

(6) 

3 

(8) 

(10) 

(12) 

103 

210 

420 

33 1 

<-3> 

24 

(-2) 

49 

(-D 

53 

26 

(1) 

7 

(2) 

5 

(3) 

X 

(4) 

— 

- 

166 

166 

166 

31 0 


28 

43 

45 

21 

6 

2 

- 

- 

- 

1 15 

0 

0 

30 -I 

I 

(3) 

15 

(2) 

21 

(I) 

16 

7 

(~ 1) 

X 

(-2) 

— 

— 


— 

6l 

— 61 

61 

29 —2 

2 

(6) 

(4) 

5 

(2) 







“ 

xo 

— 30 

40 

28 -3 

(9) 

(6) 

3 

(3) 

3 



_ 




7 

— 21 

63 

27 “4 


3 

(8) 


~ 

~ 






3 

— 12 

48 

26 -5 


(10) 









1 

-5 

25 

Totals » j 

6 

77 

137 

146 

96 

53 

27 

7 

4 

2 

554 

43i 

1,369 


ftjX —18 —154 —137 o 96 104 81 28 20 12 S»i* *» 32 

*1** 54 308 137 o 96 208 243 112 100 73 Snj**— 1330 


Averages in 
arrays 

30-17 3085 31-34 31-63 32-25 32*92 33-07 33 3 — — 


x —■ 32 •+■ 554 — -0578. Average, 7 05 78 rings. 
a 1330 -T- 554 — -0578* — -083* <T\ * 1 521. 
jP — 43X -T- 554 — -778. Average, 31-778 cm. 

<r 8 * - 1369 + 554 ~ ’ 77 8 a ~ 083* «r 8 - 1*335* 


Sheppard's corrections, Appendix, Note 5. 
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S *y + + 

4 5 * 
x6 88 
24 66 
45 24 
84 30 

63 12 
36 26 
*5 14 
18 15 

4 


9 

6 

9 

24 


+ - 
7 


- + 
4 
6 
12 
6 
4 
22 
3 

48 

49 


S xy — 636 -f 146 — 9 — 134 — 619 

r .S£y r _5 5 ;_£^_ 

554 <ri<r t 

Standard deviation of r — *035 


636 


146 


134 


Length — 31-778 cm. 

Number of rings — 7-0578 

1-335 cm. 

— 1 

1-521 

Number of 

Length deduced 

Average of 

rings. 

from equation. 

arrays. 


cm. 

cm. 

4 

30* 36 

30 - 1 7 

5 

3082 

30-85 

6 

31-29 

31-34 

l 

31-75 

31-65 

32-21 

3225 

9 

32-68 

32-92 

10 

33-M 

33-07 

11 

33-bo 

333 


Example 4 . — The following example is given to illustrate 
the value that may be obtained for r , when in the nature of 
the case -there can be little correlation. For % the last digit 
of each of the (7 figure) logarithms of the numbers 2500-2549 
and 2600-2649 was taken ; for y the last digit of the logarithm 
of a number 50 greater than x, i.e. 2550-2599 and 2650-2699. 
r is found to be *086, which is less than its standard deviation 
for 100 pairs. 

At the same time is shown an alternative method of setting 
out the arithmetic, which in some cases is simpler than the 
other methods used in this section. 

This method leads readily also to the calculation of the 
correlation ratio. 


Occurrences of Pairs of Digits. 


y 


X 

0 

t 

s 

3 

4 

5 

6 

7 

8 

9 

n % 

S y 

•rSy 

n * y , % 

O 

— 

— 

I 

__ 

1 

I 

I 

— 

— 

4 

8 

53 

O 

351 

I 

3 

— 

— 

1 

1 

— 

— 

1 

I 

1 

8 

3 i 

3 1 

120 

2 

— 

2 

I 

— 

3 

I 

— 

— 

— 

1 

8 

30 

60 

112 

3 

! 2 

— 

— 

1 

— 

I 

— 

2 

2 

3 

11 

65 

195 

384 

4 

1 

2 

2 

— 

— 

2 

I 

3 

I 

— 

12 

5 i 

204 

217 

5 

1 

1 

— 

— 

1 

I 

4 

1 

— 

— 

9 

4 i 

205 

187 

6 

1 

1 

I 

2 

— 

I 

— 

2 

I 

— 

9 

36 

216 

144 

7 

3 

3 

2 

1 

1 

3 

1 

— 

3 

1 

18 

68 

476 

257 

8 I 

— 

2 

— 

3 

2 

1 

— 

— 

2 

1 

11 

49 

302 

218 

9 

— 

— 

— 

1 

— 

— 

— 

1 

1 

3 

6 

45 

4°5 | 

338 


11 

11 

7 

9 

9 

II 

7 

10 

11 


100 

4^9 

2,184 1 

2,328 
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x = 472 a, =* 2-69 y = 4-69 o, =• 3-03 n =» too • 

S(* — x) (y — y) =* Sxy — ioojy = S(*Sy) — tooxy = 2184 — 2214 =1 — 30 

r«= 8 ° = — -037. Standard deviation of r is -i, approx. 

# IOOo.,(7, " " 

* Here n, is the number of times the various x digits 0, 1 . . . 
were found. Sy is the sum of the corresponding y’s ; e.g. in 
the first line we have 2+4+5+6+9x4=53. y, is theaverage 
y for a given x, and equals Sy -s- n, • 

and nj >, 2 — (Sy ) 2 4- n,. 

The correlation ratio is found from the last column (see p. 366). 

iooav* =»-- Sn,(y, —y ) 2 * = SnJ >, 2 — 2ySn t y, + ny 2 

= S n,y , 2 — ny 2 = 2328 — 2200 = 128. 

n = — = = *37, and has a considerable value though the 

o- v 3*®3 

correlation coefficient is insignificant. 


Example 5. — The following table gives data from “ The 
Report on Heights and Weights of New York City Children ” 
for 3,405 W s aged I 4 ~i 5 - 


• Height. 

Number, j 

Average 
weight as 
in report 

- 

- 

Average 
weight from 
equation. 


Origin 

61 inches. 

X 


Origin 

100 lbs. 

y. 

n .y, 

X. H t y t 

Origin 

100 lbs. 


— 12 

I 

— 12 

— 12 

144 

-49-6 

144 

~ 9 

I 

—20 

— 20 

180 

— 366 

400 

~ 7 

13 

— 18 

- 234 

1,638 

- 27*9 

4,212 

— 6 

59 

— 19 

— 1,121 

6,726 

— 23*6 

21,299 

~ 5 

96 

-17 

— 1.632 

8,l60 

— 19*2 

27.744 

4 

190 

-i 4 

—2,660 

10,640 

-149 

37.240 

- 3 

283 

— 12 

—3.306 

IO,l88 

-io-5 

40,752 

— 2 

349 1 

- 8 

— 2,792 j 

5 . 5 8 4 

- 6-2 

22,336 

— i | 

440 

- 3 I 

— 1,320 

1,320 

- 1-9 

3.960 

0 

434 

4 - 2 

4- 868 

O 

4 - 2-5 

1.736 

4 - 1 

400 

4 - 7 

-f2,8oo 1 

2,800 

4- 6-8 

19,600 

4 - 2 

355 

4 -ii 

4 - 3.905 

7.8lO 

-j-II‘2 

42,955 

88,723 

4 - 3 

307 

4-17 

4 - 5.219 

15.657 

4 - 15*5 

4 - 4 

200 

4-20 

4-4,000 

16,000 

4-20-0 

80,000 

4 - 5 

137 

4-24 

+3,288 

16,440 

+24-2 

78,912 

4- 6 

78 

4-30 

+2,340 

14,040 

+ 28-6 

70,200 

4 - 7 

34 

4-35 

+1,190 

8.330 

+ 32-9 

41,650 

4- 8 

*5 

4-34 

+ 5 io 

4,080 

+ 37-2 

17.340 

4 - 9 

6 

4-42 

+ 252 

2,268 

+41-6 

10,584 

+ 10 

7 

4-42 

+ 294 

2,940 

+459 

12,348 

Totals . 

3 > 4°5 

— 

11.479 

134.945 

— 

622,135 


• On p. 366 the y ’s are measured from their average, 
to subtract y throiffchout. 


Here it is necessary 


c c* 
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x 

y 

r 


■l *2270 <r 1 ■ 
- 3-371 *.« 
S xy — nxy 

fi9 x tr % 


2*99. Average 61*227 inches. 
16*3.* Average 103*37 lbs. 


,(134945 6 

\ 3405 


¥ 


*7*7 


The regression equation is — - — — — — 

or Weight — 103.4 + 4*345 (Height — 61.23). 


•797 of 


Height — 61-22*7 
2*99 


The weights obtained from this equation are given in the 
sixth column of the table and should be compared with average 
weights corresponding to various heights given in the third 
column. The agreement is close from about 57 inches to 
70 inches ; but below 57 inches actual weights do not fall off 
so rapidly as in the formula. The regression is not in fact 
linear for low statures. 


3405<r m * = S n,y,' - 34057*, and a m = 13-1 
V .= o’m -7- <t 3 = -8i 

Here the correlation ratio is practically the same as the 
correlation coefficient. 

Example 6. — The methods discussed on pp. 374-8 for 
measuring the correlation between two time-series are worked 
out by comparing the value of imports into the United 
Kingdom per head of the population with the marriage rate in 
England and Wales, year by year. 

x is the excess of the imports in any year over the average 
of the five years of which the year in question is central, y is 
similarly obtained from the marriage rate. 

X = - -62, y = - *3, o’, = 36-9, O’, = 3-59, S xy = 4309, n = 50 

f — 43 29 . • 5 g — 3 , °f 6 2 — with standard deviation =*09 

36-9x3-59 3 V50 y 

This is the measurement of the correlation by the use of 
moving averages. 


• Calculated from data not reproduced here. 
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Years. 

a 

Imports per bead. 

Annual. Deviation 

averag%. x 

Marriage Rate. England and 
Wales. 

Annual. 3 D€vUtion 
average. y 

+ 

• 

*y 

*1845 

i 3 ‘ 3 ° 

— 

— 

17*2 

— 







1846 

315 

— 

— 

17-2 

— 

— 

— 

— 

• 1847 

3*21 

3*22 

— 1 

158 

16-5 

-7 

7 

— 

• 1848 

291 

3-35 

-44 

15*9 

1 6*5 

— 6 

264 

— 

1849 

3 ‘ 5 * 

3*55 

- 3 

16-2 

16-5 

-3 

9 

— 

1850 

3*97 

3-78 

+ 19 

17-2 

16-9 

+ 3 

57 

— 

1851 

4**4 

4-30 

— 16 

17-2 

17-2 

0 

0 

— 

1852 

4*35 

4*70 

-35 

17.4 

17-4 

0 

0 

— 

1853 

5*51 

4*93 

+ 58 

17-9 

17-2 

+ 7 

406 

— 

1854 

5*51 

5*34 

+ 17 

17-2 

171 

+ 1 

17 

— 

i 8 i 5 * 

516 

5*80 

-64 

16-2 

16*9 

— 7 

448 

— 

I856 

6- 16 

5'86 

+ 3 ° 

16-7 

16-5 

+ 2 

60 

— 

1857 

6-66 

6-01 

+ 65 

16-5 

16-5 

0 

0 

— 

I858 

S-8o 

644 

— 64 

16-0 

16-7 

-7 

448 

— 

1859 

626 

67I 

-45 

170 

16-6 

+ 4 


180 

i860 

7-32 

692 

+ 40 

T 7 -I 

16-5 

+ 6 

240 

— 

l86l 

7*50 

7*45 

4 * 5 

i 6-3 

16-7 

-4 

— 

20 

1862 

7*72 

805 

-33 

161 

16-7 

-6 

198 

— 

1863 

845 

8-40 

+ 5 

16-8 

160 

0 

0 

— 

I864 

926 

8-86 

+ 4 ° 

17-2 

170 

+ 2 

80 

— 

1865 

9-06 

9-12 

— 6 

17*5 

171 

+ 4 

— 

24 

1866 

9* 80 

9*35 

+ 45 

17*5 

170 

+ 5 

225 


I867 

9*05 

9-41 

-36 

16-5 

16-7 

— 2 

72 

— 

* 1868 

9*60 

9*54 

+ 6 

i6-i 

16-4 

-3 


18 

I869 

9*54 

9-68 

-14 

15*9 

16-3 

— 4 

56 

— 

I87O 

9 . 7 o 

1009 

-39 

1 6-1 

16-4 

-3 

n 7 

r 

I87I 

io -49 

1048 

+ 1 

16-7 

16-7 

0 

0 

— 

1872 

11-13 

10-85 

+ 28 

17-4 

170 

+ 4 

112 

— 

1873 

ii *54 

n -19 

+35 

17-6 

171 

+ 5 

175 

— 

1874 

11*39 

ii *35 

+ 4 

170 

170 

0 

0 

— 

1875 

11*39 

n -47 

— 8 

16-7 

16-7 

0 

0 

— 

1876 

11-30 

u *34 

— 4 

16-5 

16-2 

+ 3 

— 

12 

*87 7 

h *75 

1118 

+ 57 

15-7 

15-7 

0 

0 

— 

1878 

1087 

11-28 

-41 

152 

15*3 

— 1 

4 i 

— 

1879 

10-59 

11-29 

-70 

14*4 

151 

—7 

490 

— 

1880 

n-88 

11-29 

+ 59 

14*9 

150 

— 1 

— 

59 

l88l 

n -37 

11-52 

-15 

151 

151 

0 

0 

— 

1882 

n -73 

H -59 

+ r 4 

15-5 

15*2 

+ 3 

42 

— 

I883 

12-04 

11-27 

+ 77 

15*5 

151 

+ 4 

308 

— 

I884 

10-92 

10-92 

0 

151 

150 

+ 1 

0 

— 

1885 

10-30 

10-56 

— 26 

14*5 

14*7 

— 2 

52 

— 

1886 

9*63 

10-25 

— 62 

14-2 

x 4'5 

3 

186 

— 

1887 

9-90 

10-37 

-47 

14*4 

x 4*5 

— 1 

47 

— 

1888 

10-51 

10-55 

— 4 

14*4 

14-7 

-3 

12 

— 

1889 

n* 5 ° 

10-93 

+ 57 

150 

1 5 *o 

0 

0 

— 

1890 

11-22 

11-17 

+ 5 

15*5 

15-2 

+ 3 

15 

— 

1891 

11*52 

TI-I 7 

+ 35 

15*6 

15-2 

+ 4 

140 

— 

1892 

11*12 

io -97 

+ 15 

x 5*4 

15*2 

+ 2 

30 

— 

1893 

I0-50 

10-85 

-35 

14*7 

15*1 

-4 

140 

— 

1894 

10*50 

10-78 

— 28 

150 

15-2 

— 2 

56 

— 

1895 

10-61 

10-81 

— 20 

1 50 

15*3 

-3 

60 

— 

1896 

1115 

11*03 

+ 12 

15*7 

15*6 

+ 1 

12 

— 

1897 

xi*«7 

— 

— 

16-0 

— 

— 

— 

— 

I898 

11-64 

— - 

— 

1 6-2 

— 

— 

— 

— 


C C* 2 
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T© obtain the measurement by comparing differences, the 
table of which the first lines are given was completed. 

« 

Imports. Marriage Rate. 4 



X 

DX 

D«X 

Y 

DY 

D*Y 

DX. DY D*X.D*Y 

1845 

330 

-15 

4 - 6 

— 

172 

O 

— 

l „ 1 

— f 

1846 

313 

+ 21 

172 

— I A 

-M 

1 1 

C*> 00 

0*. 

-294 

1847 

321 

1 u 

-30 

-36 

158 

_1_ T 


-540 

1848 

291 

+ 91 

159 

* 

+ * 

+ 182 


DX D*X DY D*Y 

Average 1574 1 —*19 -04 

Standard Deviation . . 57*13 80 5*3 6*78 

Sum of DX . DY « 8902. Sum of D»X . D*Y * 12076. 

Hence r from first differences is -6o and from second differences *45. 

Example 7. — In the experiment described on pp. 304-6, 
1,000 sums were formed each of the number of letters in 
10 words. Write A for the sum of the letters in the first 
5 words, B for the sum of the second 5, so that x = A + B. 
After each 10 a further 5 words were taken, for the sum of 
whose letters we write C ; then y was taken as B + C. We 
have thus 1,000 pairs, for which the correlation coefficient 
should be §, with standard deviation ’024. 

Actually the correlation coefficient was *553, more than 
twice the standard deviation from the fraction expected. A 
possible explanation of this is in the want of complete inde- 
pendence discussed on p. 306. The coefficients for four 
separate groups of 250 (for which the standard deviation is 
•047) were -56, -50, -58, -59. 

The regression is nearly rectilinear in the central region 
from x = 40 to x = 61 ; outside these numbers there are less 
than 20 cases to one value of x, and the standard deviation 
of the average of an array is greater than 2, so that a comparison 
is not worth while. The standard deviations of the values of 
y, included are from 1*3 to 2’0. 
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Value 

Average value of 

Interquartile 

y» deduced 

of jr. 

correspond ing y'u 

range of the y'%. 

from equation. 

4° 

% 48*3 

IOj 

45*3 

41 

448 

IO 

458 

4* 

49-1 

II 

464 

43 

47*4 

II 

469 

44 

47* 1 

9 

47*5 

45 

46-4 

9* 

48*0 

46 

48-9 

12 

48-5 

47 

46-4 

10 

49* I 

48 

50-2 

II 

496 

49 

51*1 

12 

50-2 

50 

51*5 

Hi 

50*7 

51 

49*6 

7 

51*3 

52 

53*6 

13 

518 

53 

53-6 

16 

52*3 

54 

5i*l 

10 

52-9 

55 

51-9 

6 

53*4 

56 

53*4 

M 

540 

57 

527 

12 

54*5 

58 

52*7 

11 

55*1 

59 

60-2 

8 

55* 6 

60 

56 

11 

56- 1 

61 

57*2 

11 

56-7 


• The interquartile range as calculated from theory (-6 7 of 
2<x*/i — r 2 ), formulae (26) and (107) is 10*5, to which the 
observed ranges approximate, their average being 1075. The 
range appears to be independent of the value of x t as was to be 
expected from the theory (formula (107)). 

The correspondence of these numbers is evident from the 
diagram, where the equation of the line of regression is 

y - 51-50 ^ - 51-46 

9*24 9-43 * 
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Number of letter « experiment. 
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We obtain a better numerical view of the regression* if we 
group the numbers in wider grades, in which the error of 
sampling is diminished, thus : — 


Grade 

Number of 

Average of 

Average of cor* 

x from 

0 iy. 

cases. 

y- 

responding x'%. 

equation. 

30-39 

85 

36*3 

43*8 

429 

40-49 

348 

44*7 

47*7 

47*6 

50-59 

360 

54*3 

52*7 

520 

60-69 

173 

630 

57*7 

580 

70-79 

30 

72-7 

64- 1 

63*4 


Here the regression of x on y is taken ; in the diagram 
the regression is that of y on x. 

There are two examples below 30 and two above 80. 


There are various methods of comparing the distributions 
of observations with that given by the normal correlation 
surface, of which the simplest is as follows. 

Take r =* £ as given a priori and a =* 9*32, the mean of 
the standard deviations of x and y. 

The equation of the surface is 

= I - <*’+«'•- **» 

27rcr a V'l — r 2 


Write 


x 


X - Y 
V2 


■ y = 


x + Y 
V2 


X> Y_1 

The equation becomes z = —5^ ^ e 8<r * • c and represents 

the surface referred to its principal axes, inclined at 45 0 to the 
original axes. 

The volume standing on the area bounded by X = X, ( 
X = X 2 , Y = Yj, Y = Y 2 is 


V 2tt . a 


r*Z Wir^x.—i fV *(*>'&, 

J X x V27 r . <T j V! 


and can be obtained at once from the table on p. 271, 
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Mark distances on the axis of X as in the figure, OMj, 
. . . each equal to ajyfz. Write cr v <r 2 for the standard 

deviations of X and Y. oq = <r 2 = (7/ V 2 . 

Then OM x = M 1 M 2 == ... = c rJVj = *577 cr v 


The proportional volumes of the solid bounded by vertical 
planes perpendicular to the axis of X are then F (-577) 
across OMj F (1*155) — F (*577) across etc., which can 
be found from the table on p. 271, as *2180, -1580, etc. 

Now mark distances ON lf N 2 N 2 ... on the axis of Y each 
equal to <t/V2, that is to <r 2 . 

The proportional volumes bounded by vertical planes 
perpendicular to the axis of Y are F(i) across ONj, F(2) — F(i) 
across ON 2 , etc., viz. *3413, *1359, etc. 

Since in the equation the integrals of X and Y are inde- 
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pendent, all sections perpendicular to the axis of X are cut 
in the same proportions by the planes through N x , N 2 , etc. 

9 Hence we have Ihe following table which shows the pro- 
portions of the volume of the normal surface standing on the 
equares ON 1 K 1 M 1 , MjL^Mj ... in the first line, 

on NjNjPjKj, K 1 P 1 K a L 1 ... in the second line, etc. 


Distribution on Squares op Normal Frequency Surface. 


X/tTj . 

. . . 

•577 

1*155 

1-732 

2-31 

2*89 

346 

F(X/<r 1 ) (differences) . 

• • 

•2180 

•158° 

•0824 

•03II 

•0085 

•OOI7 

Y/or, 

F(Y/<r 2 ) 

Differences. 



Products of Differences. 



I 

•3413 

•0744 

*0539 

•0282 

•OI06 

•0028 

*0006 

2 

*1359 

•0296 

■0215 

•0112 

•OO42 

•0012 

•0002 

3 

•0214 

•0047 

•0034 

•OOI7 

•OOO7 

•0002 

•OOOO 

4 

•0014 

•0003 

•0002 

•OOOI 

•0001 

•OOOO 

•OOOO 


The distribution is the same in each of the four quadrants 
formed by the axes OX and OY. 

# The decimals in this table x 1000 give the theoretic 
distribution of the 1,000 pairs of numbers if we neglect the 
skewness. 

The observations were marked in on squared paper and 
the number occurring in each of the X,Y squares was counted. 

The results are shown in the table on p. 394. The first line 
in each row repeats the theoretic numbers first given, the third 
gives the observations. 

The agreement is close within the three squares to right 
and left, and two squares above and below the centre, that is 
within dt 1 • j < t 1 and ± 2 < t 2 . The probability of so much 
divergence in a random selection is approximately £ (p. 433). 

To the left of these squares there is a falling-off in the 
observations (31 observations against 41 expected) and to the 
right an excess (54 observations against 41 expected). There 
is, however, a slight heaping up in the 12 squares to the left 
of the centre and a corresponding deficit to the right. These 
are exactly the phenomena we should expect from the skew- 
ness of the original curve (p. 304). The effect of the skewness 
is worked out in the note at the end of this chapter, and the 
results of the corrections are given in the second line of each 
row in the ^following table. The improvement is marked. 
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For ekample the expectation in the last three columns to 
the left is now 33 J (31 observations) and in the last three 
columns to the right is about 49 (54 observations). 

1,000 Pairs of Totals of Letters. 

Distribution of observations compared with normal and 
with skew frequency. 

The central horizontal and vertical lines are not the axes 
of co-ordinates, but are the axes of symmetry, which are 
inclined at 45 °. 


First lines. Normal distribution . . (thus 29 * 6 ) 

Second lines. Second approximation . (thus 28 - 7 ) 
Third lines. Observations .... (thus 35) 


0 

0 

•1 

•I 

•2 

•3 

*3 

•2 

ma 

•1 

0 

O 

0 

0 

? 

? 

0 

•2 

4 

4 

MSI 

? I 

0 \ 

0 

0 

0 

0 

0 

0 

1 

0 

1 

Em 

0 1 

0 1 

0 

0 

mm 

•7 


3*4 

4-7 

4*7 

3*4 

i-7 

n 

mm 

0 

0 


0 

KJ 

19 

4 3 

8 1 

4 9 

3-2 

HU 

■ 

0 

wmMt 

KB 

0 

BH 

1 

1 

5 

6 

4 

EX 

EX 


•2 

■EH 

4-2 

ran 

21-5 

296 

29-6 

2i-5 

ran 

4-2 

1-2 

•2 

? 

•2 

28 

10 8 

22 3 

30 3 

28-7 

20 7 

116 

86 

22 


0 


5 

11 

20 

33 

35 

20 

11 

5 


ESHH 

•6 

2-8 

io-6 

28*2 

53*9 

74*4 

74*4 

53*9 


mm 

ESid 

•6 

0 

27 

111 

333 

630 

79-2 

69-6 

448 

23 1 

a 

29 

16 

mmm 

2 

12 


47 

77 

72 

61 

24 

KM 

5 

1 


mum 

io -6 


53*9 

744 

74-4 

53*9 

28-2 

io-6 

2-8 

•6 

0 

ESI 

111 

33 3 

63 0 

792 

69-6 

448 

231 

10 1 

2 9 

1-6 


11 

0 

36 

75 

76 

64 

46 

26 

15 

5 

4 

•2 

1*2 

4-2 

T 1*2 

21*5 

29*6 

29-6 

21-5 

inn 

4-2 


•2 

? 

• 2 

2 8 

10 8 

223 

30 8 

28-7 

20 7 

1V6 

6 6 


? 

0 

1 

1 

8 

21 

32 

26 

18 

9 

8 


2 

0 

•2 

*7 

i-7 

3*4 

4*7 

4*7 

3*4 

i*7 

*7 

•2 

0 

0 

? 


2 

19 

4 3 

8 1 

4-9 

3 2 





0 


0 

KM; 

3 

2 

2 


3 

1 


0 

0 

1 O 

•1 

•I 

•2 

*3 

•3 

*2 

•1 

•i 

0 

0 

" si 

0 

? 

P 

0 

•2 

4 

•4 

? 

? 

0 

0 

■1 

mj 

KM 

KM 

0 

0 

2 

1 

0 

KM 

M3: 



The probability of the divergence from expectation as a 
whole has been tested (see p. 433), and is approximately $ ; that 
is, in only two such experiments out of five should we expect 
so close an agreement.* On the other hand, it is highly 
improbable that we should get so great divergence on the left 
and right if the distribution had been normal (and symmetrical) ; 

• The thick lines in the table are only to mark the regions to which the 
test of p. 431 is applied. 
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the second approximation is necessary for the completion of 
the theory. 

# A simple method of testing the agreement of the distribu- 
tion of observations with that given by the normal surface 
> may be obtained by studying the distribution of the ^-arrays, 
‘instead of transferring as on p. 391 to axes of symmetry. 



I (* x V* 2 ny \ 

2(1 - f*) Voi*" 1 "*,* " <T\9%) 




where a = <r t V (1 — r % ), y' = y — 


rx(r 2 

^1 


i.e y' is measured parallel 


to the axis of y from the line of regiession, as in formula (107). 
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TOO 

O-I 1-2 2-3 

•341 *136 ( *022 


F(0 

Products. 

0-1 

•341 

•116 

•047 

•007 

1-2 

•136 

•047 

•018 

•003 

2-3 

•021 

•007 

•003 

•001 


The division when the standard deviations are taken as 
units is shown in the table and diagram. 

The results of the words experiment tabulated 011 ‘this 
basis are : — 

— 3 9 — 2 <r — 9 O 9 2 <r $<r 



The vertical columns show the ^-arrays, and are comparable 
with the more detailed setting out on p. 389. In each com- 
partment the calculated values are written above the number 
of observations. 

The effect of correcting for skewness would be to improve 
the correspondence in the same directions as in the former 
tabulation. 


Note on the Second Approximation to the Correlation Surface 

When terms of the order are retained in the general law of* great 
V n • 

numbers, a term involving the mean cube of error appears in the equation 
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(p. #98). Similarly Prof. Edgeworth shows* that the equation of the 
correlation surface should under similar conditions be written 


M **Z { 


where 


* *(* M 3 **'' 0 + 3>>u dx , dy Xt + ^"dydx*** + ’ * * < IX 3 ) 

"2(irTr)( j:,+J ' , - ir *>) 


2irV I — r* 


% and y are the differences between the observations and their averages 
divided by their standard deviations, 


* mean *• , * mean x % y, k lt * mean xy % , k 09 =» mean y ». 

In the example on pp. 388 seq. we have x<r « A 4- B, y<r = B -f C, where 
9 ■ approx. 

Mean *y<r* « mean B 1 = £<r*, and r = (In the experiment r was found 
to be *55.) 

Mean x*** = k M0 = A ot = k (as on p. 251) = 409. 

Mean **)/*• *= k n = mean B* = £ mean (A + B)» =* £« = A lt . 

When the differentiations are performed with these values, we obtain 


— . 0 i(*4-y)(i8 + ii*y-8;r*~8y*) j =** 0 (i— o>) say. 

The expression z 0 w is not readily integrable, and the simpler method of 
procedure is to integrate * 0 over suitable areas, and to correct the results 
approximately. An application of the method that leads to Simpson's rule 
• shows that if 

z 6, ht z o. -k, z k o> z -\, o» 


are the ordinates of the four corners of a surface standing on a rectangular 
base, whose diagonals are 2 h, 2k, and z Q0 is the ordinate at the centre of the 
rectangle, then the mean ordinate is 

1 ( 2*00 + Z 0k Z 0 -* ** + Z-ho) 

Hence if x 0 w is calculated for the four corners and centre of each of the 
volumes tabulated on p. 393, we should reduce each of the volumes by a 
quantity x 0 w', where w' is the average of twice its central value and the four 
angular values. This has been done throughout, and the values obtained 
added to or subtracted from the numbers given by the normal curve to obtain 
the corrected values on p. 394. 

Notice that when the surface is referred to its principal axes by writing 
x 4- y ** V2 X, — x y — V2 Y. w becomes symmetrical in Y, but not 
in X. 


* Law of Error ( Camb . Phil . Trans., Vol. XX, 1905, Part II, § 6), and 

Statistical Journal, 1917, pp. 268 seq. The standard deviation is used as 

unit in the text instead of the modulus V *.<r used by Edgeworth. 




CHAPTER VIII. 


PARTIAL AND MULTIPLE CORRELATION. 

o t 

The investigation of Chapter VII has shown how the 
variations in one quantity are related to the variations in 
another by which it is influenced. It frequently happens, 
however, that the movements of a variable are related to the 
movements of a number of others. The frequency distribu- 
tion can then no longer be represented by a surface in three 
dimensions, but an analogous function is obtained of which 
the form already given is a simple case. 

The regression equation is no longer that of a line or 
curve, but an expression connecting one selected variable ‘ 
with a number of others; we can then isolate the effect of 
any one of the remaining variables (by a method involving 
and similar to that of partial differentiation), and so obtain 
the relation between any pair of variables, abstraction being 
made of the remainder. This is the very important method 
of partial correlation. 

In the sequel the case of three variables is handled in 
detail, and the more general solution is summarized. 

Let there be three variables, which measured from their 
means are x, y, z, and let them be correlated each with each. 

Suppose that they are so connected that z — ax + by + c 
is an ideal plane giving the mean value of z corresponding to 
a pair of values x, y. 

Required to find a, b, c so that the observed deviations of 
observed values of z from the values given by this equation 
have the least improbability. 

Let S, be the average of k, observations, each of which has 
for its x, y members x, (to x, + Sx) and y, (to y, -|- By). 

Write rj, for 2, — (ax, + by, + c), i.e. the deviation of the 
mean of the observations in the s th group from its ideal value. 
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Let (T x , <r Vt <r$ be the standard deviations of the frequency 
groups of x, y , z . Then, if in the long run the standard deviation 
of a group in z is independent of the values of x and y , the 


standard deviation of 77, is ^ ~ (formula (38)), where a is con- 


stant standard deviation for an array, and the probability of the 

— 

occurrence of rj 8 (to rj g + &v) is Ke ^ 2 . Srj. 

Let there be n pairs of values such as x ti y t and N obser- 
vations in all, so that N = ^ -f . . . + k 9 + . . . + k n . 

The • probability of the concurrence of ^ . . . 77, . . . y n is 

0." ^ where (f> = A^ 2 + . . . +k ^ 2 + • • . + and C is 

constant. 


n 

The probability is greatest when </> =S k 6 [z 9 — {ax g +by t +c )\ 8 

1 

is least, and a , b, and c must be chosen to give this result. 


c f > = S(k g l 8 2 ) + a 2 S(k s x g 2 ) + b 2 S(k g y g 2 ) + c 2 Sk, — 2 aS(k a x a z s ) 

— 2bS(k s y t z M ) — 2 cS(k,Zs) + 2 abS(k 3 x s y a ) + 2 acSk,x t + 2 bcSk,y t . 
• Here Sx$ = o = S y$. S ktz» = sum of all values of z = o. 

S ktXt* — No* 2 , being repeated kt times in the whole group, 
and S k$y » 2 = N<r y 2 . 

SktXtZt = S%2, since kzz$ = sum of values of 2 in the group, 
and Sk»y»z 9 = S3/2. Also S k»x$y$ = S%y. 

<t> — S(^#2# 2 )+Na 2 o’ a; 2 +N6 2 (rj / 2 — 2aSxz— 26S^2+2aiS2y+Nc a . 
Write S xy = N r^a-xa-y, S xz = Nr^o-xo*, S yz — Nr yz <vr*. 

. . . 0d> 0d> . 

Then is a minimum, when r 1 , rr > are each zero,* t.e., 
^ da db dc 

when 

o = ~~ = 2 Nc and c = o, 
dc 

r\i 

O = - — = 2N (il(7x 2 T" bfTz&yfxy ®x&zfxz) ==: U, 


and 


O — ( UO’xO’yfxy + b(Ty 2 O’yVzfyz) — O. 


Hence 

and 


4" bffyTxy OTzfxz 

AVxYxy 4 " btTy = CTzfyz- 

a<r x bcr y <r t 

Txz 7 xyfyz fyz ?xz?xy I fx y* 


(114) 


* The values of a, b and c can be obtained without differentiation by 
expressing <p as the sum of squares. 
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The equation z — ax + by + c becomes 


-=R X .~ + R V . 

<r, <t x 


y_ 

9 

<Ty 


(115) 

1 


where R x = f “~ - r ^ , = 

I — **y* V I-V 

R* and R y are called partial regression coefficients between x, x 
and z, y\ for a given y, z = R x ~x + const., and for a given x 

ar x 

z = y + const., formulae which may be compared with 

°V 

y * r — . # given above (p. 362). 

<r* 

Similar equations are of course to be obtained when x or y 
are expressed in terms of y, z or x, z . 

The partial correlation coefficient between x and z (y constant) 
is defined, by analogy with the case of only two variables, as the 
geometric mean of the partial regression coefficients found respec- 
tively when z is expressed in terms of x and y as in (115) and when 
x is expressed in terms of z and y; it is therefore 

Y xz — Y xy r yt 

V 1 — tv * Vi — ry, 2 ' 

The foregoing analysis is based on Mr. Yule’s paper 
(Statistical Journal, 1897, pp. 831 seq .) and book, and to him 
is due a great part of the work on this subject. The treatment 
here differs from his only by the important consideration that 
it is based on the prevalence of the law of error as discussed 
above (p. 298), and that it makes the assumption that the 
standard deviation of z is independent of the values of x and y, 
which is by no means universal ; while Mr. Yule does not need 
this assumption, but uses the method of least squares, a method 
winch is not used (except very rarely) in this book, because of 
the difficulties that underlie the principles involved. 

The equation between z and x and y is the same as Mr. 
Yule’s, and also the same as obtained (see p. 405 below) from 
the theory of normal multiple correlation. 


Example 1 . — The Cost of Living Committee, 1918, collected 
a number of budgets of the weekly expenditure on food in 
working-class families (see p. 310) ; 390 of these, obtained from 
families of the skilled classes, were grouped accbrding to the 
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numbers of persons in the households above and below 14 years 
of age (see Statistical Journal , 1919, p. 360). 

Th$ notation and' quantities involved are as follows : — 


• 

• 

Expenditure on 
food. 

Number over 

14 years. 

Number under 

14 years. 

Average .... 

51S. 

248 

3*56 

Difference from average 

* X 5 ? * 

* 

y 

Standard deviation 

*, = 3 03 x 5s. 

0’*= -836 

1*4° 


^,=—•0525, VtX =- *504, f,y SB ’SIS- 


The equation obtained is */<r z = ^2x\a x + *35y/cr y , which 
leads to the formula : — 

Expenditure (shillings) =14-5 + 9-4 x number over 14 years 
+ 37 x number of children under 14 years, 

and to the following table : — 


Family Expenditure on Food.* (Shillings.) 


• 

Number 
of persons 
over 

14 years. 

By Formula. 

Average of actual cases. 

Number of children. 

Number of children. 

a 

3 

4 

5 

2 

3 

4 

5 

2 . . . 

3 • • • 

4 . . . 

40-7 
50 - 1 
59*5 

444 

53-8 

63-2 

48*1 

57-5 

66-9 

518 

612 

706 

4°-5 (74) 
54-8 (21) 
58-0 (10) 

45'2 ( 74 ) 
51-2 (17) 
602 (10) 

47-1 ( 53 ) 
58-2 (16) 
78-1 (6) 

1 

5 2 '9 (25) 
64 9 (17) 
— (0) 


The numbers in brackets are the numbers of actual cases averaged. 


The agreement between experience and formula is as close 
as can be expected, when the considerable standard deviation 
and the small numbers of cases are considered. 

From this example it becomes clear that the method of 
partial correlation is closely akin to the ordinary way of 
analysis in cross-tabulation ; but the use of the formula brings 
the separate results into coherent relation. Here we have 
the result that on the average an additional adult (who 
generally increases family income) adds 9s. 5 d. to the family 

* Fo? the working out of these figures and those on p. 310 I am indebted 
to Miss King and*Miss Mackenzie at the School of Economics. 

D D* 
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food ‘expenditure, while an additional child adds only 3s, 8 d . ; 
the greater number of children the lower is the standard of 
living, since a child’s nourishment costi> about two- thirds of 
that of an adult. (Here “ adult " is used for a person over 
14 years.) , 

Example 2. — The following data are obtained for the County 
of London from the Census of 1911.* 

z + 3'7 ^ the number of rooms to a tenement. 

x + 4-15 „ „ persons in a family. 

y -f -86 „ „ children under 10 in a family. 

The averages for the county are 37, 4-15 and *86 respectively. 

r*y = - 5 7 . = ’ 44 . = —• < 03. <rt = 2 - 59 . = 2-32, <r y = 1-24, 

R* = -676, R„ = — 402. 

The figures relate to 1,023,951 families, sufficient for 
accuracy to three significant figures. 

x = x x — x -676 —y x — x -402 = x x 754 —y x -840 

G% Gy 

or (rooms — 37) = 754 (persons — 4-15) — -84 (children — *86). 

The number of rooms for families of given size decreases 
rapidly as the number of children increases. 

We may also write : — 

Number of rooms = 1-29 -f- 75 persons — 84 children 
= 1-29 + 75 persons over 10 
—•09 children under 10. 

Example 3. — In the research on the social conditions in 
Reading described in Livelihood and Poverty the income, 
rent, and family constitution were tabulated for 586 families. 
Rent increases with income and with the number of earners, 
but for the same income and the same number of persons it 
may be that the more numerous the children the less can be 
afforded for rent. 

Rent : x + 6*075 shillings, where 6*075 shillings is the 
average. 

Number of equivalent persons : x + 3*287, when 3*287 is 
the average. 


* I am indebted to Mr. J . W. Nixon for this calculation. 
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Income: y + 31-712 shillings, where 31-712 shillings iS the 
average. 

# Th® number of '* equivalent " persons was obtained by 
classifying adults and children on a somewhat arbitrary scale 
aecording to the house-room they may be presumed to need ; 
children under 5 years were counted as J, children from 5 to 14 
as $, boys of 14 to 18 and girls from 14 to 16 as f, and older 
persons as 1. 

The correlation between rent and number of rooms is 
close, so that rent may be taken as measuring house-room. 

<r f = l-*33* <r* = l-22, = = -543, ^, = -152, ^,=-458, 

R* = — 136, R* = ’ 53 2 - 

Z X V 

Hence — = — 136 1- -532 — , 

<T Z O x 

or z = — *148% + *0544^. 

House-room then decreases perceptibly as the size of the 
family increases, for given incomes. 

Each of the three examples shows that families with 
children tend to secure relatively less food and less house- 
room per head than families without children, and to some 
extent measures the loss. 


We have still to consider the theoretic distribution in three 
dimensions of three variables, corresponding to the normal 
correlation surface for two variables. The following pages 
show the results and the analysis in simple cases. It will be 
observed that the same lines of proof are followed as in the 
case of two variables. 

Multiple Correlation Surface. 

The following analysis is only valid on the assumption 
that the elements have normal frequency. 

Let X, Y, Z be three quantities which depend on other quan- 
tities U, Vj, V lf V s in such a way that X = U + V lf Y = U + V f , 
Z = U + V a . 

% Let IJ, Vj, V t , V, be chosen at random from normal frequency 
groups whose averages are a, v x . . . and standard deviations 

&Ut ffpj • • • 


D D* 2 
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Lit X, ¥, Z be the averages and <r x , <r v , <r z the standard devia- 
tions of X, Y, Z, and let X = X + # . . . U = U + u ... 

Then in the long run X = U + V etc. and r , 

x = u + v v y = u + v t , z = u + v 8 . # 

Suppose u, v v v t , v z quite independent of each other, so that 
mean uv x ~ o = mean v x v % etc. 

Write fxy for the coefficient of correlation between x and y. 

Then c r x a = cr tt a + or 2 . . . , 


and <r x c Tytxy = mean (u + v x ) (u + v t ) = cr tt a 

= < TyfTzYyz = <T Z <T x? ZX’ 

The joint chance of selected values of x v y x , z x arising from 
particular values u , v x , v t , v z is 


<r u 


. Wtt 


- i- 

e X 


. y/ 21 




x 


= Pu 


subject to the conditions = w + v lf y x = « + v 2 , z x = « + r a . 
Eliminate v lf v a , v 8 . 

The joint chance of z 1 arising from a particular value 

is given by 


2 log (P u . 4 7r2 °’uO- Pl cr t , | o- r8 ) 


« 2 (w — x x ) 2 


O'u* 


o-ri 


(«— 3^* 

<Tv* 


(* — * i)* 


tf(w-6) 2 + £, 


where 


i i i i 

& — 2 I 4 H o "f" 2 

tTu °V CTt,, 2 <T V * 


y_i_ , h 

o' rj 2 ' Or, 2 ' 0't, S 


«&-r L ,+ 


c = £i! , V 

O -. l 2 O ’!), 2 




at*. 


The whole chance of the selected values x 1 y x r, arising from any 

value of u is P = / P u du, x 1 y 1 z t being regarded as constant 
J - «• 

- - (u - 6)* 

g 2 f“ g * ’ 

J-o. V2H 


(2ir)^<r u <J'p,o- e ,<r ei J 


iw = 


(2-)* 


Cucr |> cr t) a (Tp 8 


V« 


Write x, y, z for y v z v 
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Now 


A A A 

# <r u a # cr v% 2 o -,,, 2 ~ <r tt 2 <r y 2 — cr tt 2 <r 2 2 — cr u 2 ~~ <r u 2 cr v 2 a Vi 

a * d a = D+ :vj...» + - 2 Z - ..2 + - » ; L 


Qy Vz 2 — cr u 4 


0-u* <r-r a — tr u 2 <r y 2 — cr u 2 cr* 2 — <r u 2 

_ ( T X 2 <Ty 2 (T z 2 — Ou 4 (<Tx 2 + (Ty 2 + (T z 2 ) + 2<T M 4 
" <ru 2 cr v 2 <r v 2 <r v * 

C {or^O-yW — Ou 4 (^X 2 + Oy 2 4 0* 2 ) + 2tT u 8 } 

= X 2 (o-yV* 2 — cr u 4 ) + + — 2xy(ru 2 cr t 2 — — . 

Write Ro-ajVy 2 ^ 2 for (r z 2 cry 2 a z 2 — <r u 4 (cr* 2 -f~ 4)4 2cr u 8 , 

SO that R === I 4 2f xyTyz^zx 7 xy 2 Yy 2 — Y z % 2 . 

The chance of the concurrence of x , y, z is then 

t> 1 “ • £ (ri^ 1 “ r<r **^ + + “ ~~ (r*V - Tory)- -) / /;, 

P— 0 SRlac* I , (Il 6 ) 

(27r)*ar x <Ty<T g R* 


for 


Yxy fxzfyz V'xPyTxyV 2 ■ ?sy . Vy&z ■ r ?/2 


<T y 


<r X 2 CTy 2 (T z 2 


cr u 2 cr z 2 — or„ 2 . tr u 2 <r u 2 (r v 2 


<Tx 2 Vy 2 <rz 2 


In the special case where 

cr U === 0"t;i == CTpj = (Tpj, CTj; 2 = 2 CT u 2 = CT 2 , Say 

Yxy == • Yy Z == and R =r ^ . 

The chance is then 

4 e -4^i{M** + y* + «*)-2(xy+y* + **)} . 


vx l v 2 v z l * 


2cr 


■M 


The most probable value of x for given values of x and y is 
0P 

obtained from — = o, and is 
dz 

— (i fxy 2 ) — (?zz — Vxyfyz) 4 ~ (?yz fry^x*), 
cr z c r% (T y 

as in formula (115). 

In the special case this becomes z = $(x +y). 


It is shown by Elderton (following Pearson) that if x, y and x are the sum 
of any finite number of variables, such as the u, v, . . . above, all of normal 
frequency, and some common to a pair (x, y) ( y , z) or (z, x) and others occurring 
*in only one of these, then P is of the form Ke - ( <u ' t + ''»* +“* + %>«+*»« + 2'***) 
where a, b , c, f, g , h are constants to be determined. 



ELEMENTS OF STATISTICS 


Tdtke the aggregate of the chances to be unity. 

Let A, B, C, F, G, H be the minors of the determinant A -■ I a h g I , so 


Let A, B, C, F, G, H be the minors of the determinant A -■ I a h g \ , 

that A * be -f\ F - hg - af , . . . BC - F* « aA , . . . 

Then - log R - « (* + ;> + { ')*+ “ ? *)*+ £ ** 

"* - ///»*** - K - - ^ • */&*'*"* “ • *©'- i 


Similarly 


. B . A 

9 "" 2A’ *" "" an 


u / A g» 

r^/p, • J J J'Pxydxdydx ~ KV*a~i J j xye 0 <fyd* 

— KV*a~iJJz(y' + ^ d y'dz t 

F 

where y' — y — and the limits of the integration are ± co 
- K —‘* 

H G 

'Similarly <r K eyr n *= \ - - and w x <r,r xl ■» £ — . 

A • A 

aA«BC-F*« 4 a/<r,*(l - r**) a* 

/A - GH — AF =* 4 <r x % <fy<r t ['xy'xx — *>) A* 

A* ** ABC + aFGH - AF* - BG* - CH* 

« 8&*<r x *<r,*<r*{i -f 2v x f„r„ - r** - r„* — r x /). 

Write R for the quantity inside the bracket. R *» r r„ 

r *i i 
'mm '» I 

Then A -8TO^r- a = W’ '- 2 $3 iB - 

Hence P- 1 . + ‘ ^ 

V R (2*r**<vr, 

as obtained in the special case above. 


If instead of %, y, t there are « quantities x t x M x n a more genera 1 

proof on the same lines (due to Prof. Karl Pearson) leads to 

I - R n + + +*^~ R u+ +} 

p t g 2K Wi* 11 9 X <T % ■■ e, * 

» « 

VR (2ir)* . » 
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where R « 


•'as • • • 


, r m being the same quantity as r a , and 


R* being the minor obtained by crossing out the 5 th column and row, with 
its appropriate sign. 

. The following note indicates the course of the proof. 

As before P = K* where <p ■» a n x x * + . . .4>2a,^r 1 jr l 4- . . a,, . . . . o, t . . . 
being constants. 


Let A- 


where a* «= and let A „ be a minor 


&n\ <*«t . • • 

obtained by crossing out the 5 th column and row. 

Ann *= A n _j. 

Then <p may be expanded so that x t occurs only in the first term, x t onh 
in the first two terms, etc., and then 

♦ - + £*.+• • •)’+ +-)■+-+ 

jis may be shown by a rather troublesome induction. 

P . ... . K J. (4)'(: f ;)' . . • ^ 

'«*-//••• “ 1 • * 


Similarly, by changing the order, <r t * = 


A„ 


, r w> n—i J J 


Similarly, by changing the order, 


1/^ _ A *- " y * 1 


d*»-i<lx m — - '■ 


<r # <r t r rt • 


2A„ 


Substitute these values for r* etc. in tho determinant giving R, and we 
obtain 

A n n_l 


R - 


* 1 

A„ 

A„ 

. . . A ln 

(lA.O’V,* . 

. . <r, t * 

A,i 

A„ 

. . . A t n 



An,' 

A„ 

. . • An- 


(2An) n a 1 * . . © n *‘ 


by a well-known theorem in determinants 
1 . .5 

g = (2w) *cr 1 . . . <r n 


. v/R. 
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Rn 


I 

(2A M ) n ~ 1 <r t * . . . <r n * 


well-known theorem. 


A m . . . 

A,. 

Ani • • < 

’ Ann 


*ll*n*’' 

( 2 A n ) n “ 1 <r 1 * . . . <r n ,# 


by another 


«u 


R n 

aRV 


R R • 

Similarly a,, =» — - 1 — , and by changing the order a* — ^ *- , and hence 
2 K(T|(r t 2K<r # <r t 

we obtain the formula as given at the bottom of p. 406. 

The most probable value of x x for given values of x tf is then 

given by 




CHAPTER IX. 


PRECISION OF MEASUREMENTS OF AVERAGES, 
MOMENTS AND CORRELATION* 

Inverse Probability. 

In the previous chapters the problems of the errors that 
arise in the process of sampling have been chiefly discussed 
from the point of view of the universe, not of the sample ; 
that is, the question has been how far will a sample represent 
a given universe ? The practical question is, however, the 
converse : what can we infer about a universe from a given 
’ sample ? This involves the difficult and elusive theory of 
inverse probability, for it may be put in the form, which of 
the various universes from which the sample may a priori 
have been drawn may be expected to have yielded that 
sample ? 

To make the argument clear it seems expedient to make a 
short digression on the theory of inverse probability. The 
following examples illustrate the problem and its solution. 

A sovereign and two shillings were in a purse. One coin 
is lost. One of the remaining two is taken out and is found 
to be a shilling. What is the chance that the sovereign was 
lost ? 

The a priori chance that the sovereign was lost is p\ = $, 
and that a shilling was lost p\ — if we assume that the loss 
of any one coin was as likely as any other. 

If the sovereign was lost, the chance of drawing a shilling 
was p x ~ i, since there is no other to draw. 


* See Edgeworth in Statistical Journal, 1908, pp. 381 seq . ; Yule, Intro- 
duction to the Theory of Statistics, last chapter ; Transactions of the Royal 
Society* Pearson and Filon, Vol. 191 (A. 220), and Sheppard, Vol. 192 (A. 229), 
1898 ; Biometrxfta, Vol. II, Part III, p. 280. 
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If \t shilling was lost, the chance of drawing a shilling was 

Pt = i- , 

The a priori chance that it should be a sovereign that ,is 
lost and a shilling that is drawn, is p x P\ — J. 

The a priori chance that it should be a shilling that is lost? 
and a shilling that is drawn is p x p t = | X i = £. 

By hypothesis one of these equally probable double events 
has happened, and there is nothing in the data to show which. 

It is therefore just as likely that the third coin is a sovereign 
as a shilling. 

We may generalise this proposition in the followirfg 'way. 
Of various possible events whose chances of occurrence 
are respectively p x , p t ’ ... p t ’ .. . one is known to have taken 
place. A further result is found, whose probability, if the 
first, second . . . < th . . . event had happened, would have been 
p v p % ... pt . . . respectively. 

The a priori chance that the I th event happened and pro- 
duced the result is p t ' x p t . 

A priori the chances that the events of the first series would 
happen and produce the result are in the ratios 

PiPi'- P2P2' • • • '• pt Pt • • • = Pj t : ■ 

But we know that one or other of these did happen, and 
this additional knowledge does not affect the relative magni- 
tudes P lP P, . . . , but raises their total in such a ratio, K, that 
it equals i, which represents certainty on the scale of algebraic 
chance. Hence K.SP ( = i, and the chance that it was the t th 
event in the first series is 

KP Pih . 

‘ Pi'Pi+...+Pt'p t + ... 


In a bag there are six similar balls which are known to be 
black or white. One is drawn and is found to be white. What 
can bo inferred as to the original number of white balls in the 
bag ? 

The answer depends on the hypothesis made as to the 
a priori chances of distribution between black and white. 

If a priori each ball is equally likely to be black or white 

^ 9 

and Pt is the chance that t were white, pt = . iC ( . 

2 
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Pt =* g whatever the hypothesis. 

P 0 : Pi : . . . : P 6 - (o : 6 : 30 : 60 : 60 : 30 : 6) . 


SP t = i and K = 2. 

The chances that there were o, 1, 2, 3, 4, 5, 6 white balls 
are respectively 0, A. H. W. A. 

But if the number of white in the bag had been determined 
by throwing a die and taking the number on the upper face, 
theh * 

Pi-Pt-- • • -A'-g; P.-g-g; 

SP« - K - — , and KP ( - — . 

'36* 7 21 

More generally if there were n balls in the bag and the 
number of white had been determined by spinning a disc, 
marked on its circumference with the numbers o, I . . . n 
equally spaced, on a vertical axis, and taking the number 
nearest a fixed point adjacent to its circumference when it 
came to rest, then 


p t '= - 4 —, p t = SP f = KP t = J ~ 

t n + i r n 2 n[n + i) 

The aggregate chance that c~ finally the number of white 
balls was t or less is 

t(t -f- 1) 


K(P 0 + P x + P, + . 

if m 


+p«)=, 


=/(*). say. 


n{n + 1) 

it is as hkely as not that there were as many as 


t white ; and, if n is great, t = — ^ satisfies this equation 

v 2 

approximately. 

Hence, when « is great, it is as likely as not that the 

proportion of white balls to all was as great as = *7 . . . 

V 2 

The chance is approximately J that the proportion was 
between i and 

2 

This example is very important, both as showing that the 
result depends on the hypothesis made as to the relative 



ELEMENTS OF STATISTICS 


4*2 

a priori chances of the unknown events between which we 
have to choose, and as indicating that we can get a more 
comprehensive result by aggregating the? chances than by 
taking them singly. 


Precision of p, the Proportion of a Particular Class in a Universe. 


We will first apply the principle of inverse probability to 
the determination by sample of p , where ^>N is the number of 
things having a certain characteristic in a universe containing 
N things, and n are selected at random and p'n are found to 
have the characteristic. 

The chance that p'n should have been found from a given 
p is nCp'np*'" <p fn (P- 262) where q — i—p, q'=i—p'. 

If all values of p from o to 1 are a priori equally probable 
then the chance that p'n should be found from any value of p } 
from p' to x, is the sum of the chances from particular values 

= nCVnJ *P n (1 — x) q ' n dx, and therefore by the theorem on p. 410 

the chance that the original value of p was between p' and x is 




nCp'n . fV»(l -X)<’*dx 
J_£ f 

J 0 


which can be reduced as follows to the form of the normal 
curve of error if is neglected. 

Write x =* p' + z<r, where crht = p'q', and 1 — x = q' — z<r. <r is 

of order . 

Vn 


Then 




j>* 


if x — 1, z = *-77, and if x = o, x — —*J which .tend 

r 9 4 

to ± 00 when n is great. 


since 
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Then 


log '/(*) = P'» log + q'n log (1 - Z pj 

-, + ^ + terms involving a’w, 


z 2 a*n ( I 

= 2X0 ( — 

> 2 \p l 


i.e . terms of order -7= 

Vn 


= — \z 2 , when -7- is neglected, since p* + q' = 1. 
Vn 


• ♦ 

Pas = 


/: 


e-W&z 


/: 


e-\ l% dz 


-f. 


V21 


: e-**dz 


. . . (117) 


Hence the chance that the observations arose from a universe 
in which the proportion was between p' and p' + p x is (writing 

for 2) ( Pl -4-_ « ' ^ rfw, where <r = V . 

<r Jo <rV 2 tt V « 


The above analysis is based on that given in Todhunter's 
History of the Theory of Probability , pp. 554 s^. 

This is the converse of the theorem that the chance of 
obtaining a value from p to p + p x in a sample from a known 
universe is 



u% du, where a — £). 


The difficulty in the above analysis lies in the assumption 
that all values of p from o to 1 are a priori equally probable. 
The hypothesis can be elucidated as follows. 

Let n = 100 and p' = -i. 

If the observations came from a universe where p =* *07, 
then a 2 — ° 7 I ^ 0 - 3 . «■ = *0255, ^ ~ ^ = 1*18=2. The aggre- 
gate chance from p = *o6 to p — *o8 is approximately the 

chance for p — *o 7, exactly given by the ordinate —i=^e -***, 

V 27 T 

multiplied by an abscissa of *02 taken as a multiple of a, 
viz. *02 4- *0255 = *78, and equals *78 of — L= « - *0 18 >’ = *157. 

V 27r 

A series of shch calculations leads to the following table. 
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Value of >. 

Approximate 
chance of 

obtaining 

•oo— oa 

•OOO 

•02—04 

•002 

•04—06 

*029 

•06—08 

•157 

•08—10 

•262 

•10— 12 

•242 

•12—14 

•159 

•14—16 

*084 

•16— 18 

•039 

•18—20 

•OI4 

•20—22 

•003 

•22—24 

•OOI 

•24—26 

•OOO 


•994 


From the observations p would be given as *i with standard 
deviation ^ ^qq"^ = *°3> an( i a considerable positive skewness. 

The values of p differing from -i by more than twice the 
standard deviation give negligible probabilities whether we 
suppose them a priori equally probable throughout the scale 
or not. 

This example then illustrates a theorem that we may give 
as obvious : that, except in the neighbourhood of the central 
value, it is indifferent what distribution of a priori probabilities 
of p we suppose. Over the small, important central region the 
assumption that the a priori probability of p over a region is 
proportional to that region is likely to be a good first approxi- 
mation, whatever the actual law. (Edgeworth, Statistical 
Journal, 1908, p. 387, and references there given.) 

We are not, then, liable to any considerable error from the 
assumption that underlies this and similar investigations, that 
the important values of the quantities sought are a priori 
equally probable at any point of the range of values that 
affects the analysis. 

We may now sum up the result of finding p by sample. 
The most probable value in the universe is the observed value 
p'. The probability of a deviation, as great as p v from the 
observed value is given approximately by the normal error 
function with standard deviation 


V£HZ2, or ^IOT(n|) 


where N is the number in the universe and ^ h not negligible. 
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The precision of a measurement is measured by the 
reciprocal of the standard deviation of the errors to which it 
is liable. ' 


General Method . 

More generally let X' be any given function of n samples 
chosen at random from a universe where the (unknown) corre- 
sponding function is X, and let X = X' + *. 

If we can show that the chance of obtaining the value X', 

when the value in the universe is X, is of the form P X = P 0 e v*, 
where P 0 is the maximum chance and is obtained when X = X', 
a is constant, and x = X — X', then we can affirm with 
reasonable certainty that the sample gives evidence that the 
most probable value of the function in question is X', and that 
the chance of deviations from X' is given by the normal 
function with standard deviation a. In the case above, 
p ' is X', p is X, x is za and n<r 2 = p'(i — p'). For the more 
general case the process of inversion is not quite so direct.* 

> In order then to determine the precision of any measure- 
ment based on a sample, we have to take three steps, the first 
to find the standard deviation of the difference between the 
true value and the observed value, the second to find the 
chance that any assigned deviation would arise, the third to 
apply the principle of inverse probability. 


Precision of the Arithmetic Average. 

In Chapter III it was shown that if n quantities were 
selected at random and independently from a frequency group, 
which satisfied certain conditions, that the chance that the 
average of the n quantities differed from the average of the 
universe by as much as x was 

* If all values of X are a priori equally probable the chance that the 
observations came from a universe when the value of X was within the 

limits X' ± x is 2J P« . dx, if x is small, and by inverse probability the 

chance that the value in the universe was within these limits is 

•2 r p ,dx r p,d* - r i ** . <**. 

Jo J -• ov2irJo . 
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rtm j ** 2<y I 

2 7= * * d#, where <r a 

Jx &a V 27 T , 


Vn 1 


o* being the standard deviation in the universe. 

It will be shown immediately that o', the observed standard 
deviation of the sample, differs from a by a quantity commen- 

( T 

surate with and hence if n is large, o' may be taken as 
equivalent to o. 

We may now complete the argument and say that if in a 
sample of n things, drawn independently from a group in 
which no large portion is distant more than, say, twice its 
standard deviation from its average, the average of the 
sample is x and its standard deviation is o, then the chance 
that the average in the universe differs from x by as much 
as x is 

r Vn 4 (£)* . 

2 I — ~e dx, (118) 

J x <rV 27 r 

when n is ‘large. 


Precision of the Standard Deviation. 

We will now extend this theorem to test the precision of 
the standard deviation and second moment as determined 
from the sample. 

Let x, o, . . . be the unknown constants of the universe, and 
x + x', or' , fif • . . be the corresponding values calculated from the 
sample. 

Let x + x% be any observation, and write %t — x t — x\ 

The frequency curve of x t 2 has fx 2 for its average, and its 
standard deviation is given by <r d 2 = mean ( x% 2 — /x 2 ) 2 = Ha — nf- 
Its fourth moment is mean (. x t 2 — /u 2 ) 4 = ^ + 6 /x 4 /x 2 2 — 3^ 

= M 4 , say. 

In the universe ^ is finite by hypothesis for all values of s, and 

or * 

therefore = (j - 4 J + 6 * - 3) - (£ - i)‘is finite. 

Similarly, for any moment of x t a , M, <r/ is finite. 

Hence the average of the quantities, x t *, as occurring in the 
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sample, by the theorem summarised on p. 312 and just used, may 
differ from /a, by an error with normal frequency and standard 
deviation 

l d _ = //' 4 — Ms 2 

Vn v —a 

This average nt 2 , say, 

Q y.2 T T 

= — = - S (x' t + *')* = - Sx't* + *’». since Sx’ t = o, = /x,' + 

n n n 

Now x ' 2 is of order - from formula (118), and the error in 
n v 7 

m 2 his just be shown to be of order Hence x ' 2 can be 

neglected, and /a 2 ' written for m 2 . 

Hence the observed /* 2 differs from fx 2 in the universe by an 
error with normal frequency and standard deviation 

(» 9 > 

S' 

But <t 2 = fi 2 , Sc t — ; hence a differs from <r by an error 

2V>* 

•with normal frequency and standard deviation 

(I 20 ) 

If the universe was normal, ^4=3^2*. and the standard deviations 

of the observed standard deviation and second moment become 

^ and kJ 2 - (121) 

respectively. 

By a similar method the standard deviation of 


where wt 4 = ^ S(x t f — *') 4 = f * 4 — 4*7*3' + 6 *'V/ — 3 *' 4 . 

Hence the error in p 4 is of as low order as that in m 4 , that is of 

order -^r. 
vn 

We may therefore, in calculating the standard deviations of 
ix 2 and <r, replace the unknown /x 4 and /x 2 by the known fx 4 and n 2 , 
and in calculating the standard deviation of the average we are 
r justified in writing </ for o\ 
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The standard deviations and frequency curve of errors in 
higher moments or in the correlation coefficient cannot be, at 
any rate readily, calculated by this method,* and the f whole 
basis is reconstituted and the arguments reset in the following 
paragraphs which are based on the papers to which reference 
is made at the head of the chapter. 

Standard Deviations of the Average, etc., without Reference to 
Inverse Probability . 

Suppose that in a universe containing N measurable objects, 
there are N xy x at measurement x v N Xy 2 at x % . . . , hud that 
n objects are selected at random, tt/N being so small that the 
chance of getting any value of x is not affected by previous 
selections. 

N = N x y x + N x y t + .. . , y t +y% + . • . = 1. 

Let x', <r', and /£*' be the average, standard deviation and 
second moment found from the samples. Required to deter- 
mine the precision of these values as representatives of the 
average, standard deviation, and second moment for the 
universe. 

Suppose x v x t ... to be measured from the (unknown) average 
in the universe, so that = S (xtyt) = 0 

Let fa be the second moment for the universe, so that ft, 
= S . xt*yt. and write /*, = a*. 

The sample will not, of course, contain precisely n x y x at x v 
n xy 9 at x t etc. 

Let the numbers actually found be n(y x + ej at x x . . . n(y t + e t ) 
at x t . . . 

Then e^ -J- e 9 - 4 " ... -f- et -f- • . . = 0. 

x v x t . . . are, of course, constant. 

Since yt is the chance of finding an object at xt and the experi- 
ment is made n times, et has normal frequency with standard 
deviation 

Hence the mean of all values of et* is — — • and et is of 

ft 

order i/Vn. 


• The method is based on communications to the author from Professor*' 
Edgeworth. 
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Let E be the aggregate error in all the compartments together 
other than the s‘ k and the t*. Write Y for 1 —y»—yi. 

TheA "f ft + E = o 

2 e$et = E 1 — e , 1 — et* 

» 

„ ,\ Mean e&t =» | mean E a — } mean e t * — J mean ef 

= ^{Y( r - Y ) -y,(l-y,) -ytfr-yt)) 

* ’ _ YQA 

n 

Now let F be any linear function of y t , y t . . . , so that 
F = « t y, + « 1 y t + .... 
where a^ a t . . . are known constants. 

Write F +/== <*\{y\ + ^i) + • • • + &t [yt + et) + . • . , 
so that / = a x e x + . . . + cttet + . . . 

p = S apet 2 + 2S a*ate$ct. 

Then, if ay* is written for the mean value of /* when all possible 
values of e v e lt . . . have been found in due proportion, 

ay* = Sat 2 . (mean ei % ) + 2S<j**t(mean e$et) 

= -{Stft^i — yt) — 2S a$a t y»yt} 

ft 

=»-{Sai*yj — F*} (122) 

ft 

Put a x = x x . . . at — xt . . . 

F = Sxtyt = 0 
i' = F + / = Swi 

mean of *'*) = -S*t a yt from (122) = — . . . (123) 

* ft ft 

x' is therefore of the order 

Vn 

Now put = x^ . . . at — xf 

= F = Sx^yt, ft*' = S(xt — x') 1 (yt + et) 

/V — Mi = / = Sxt % et — 2x'Sxtyi + terms involving and *' # 

which are of order - . 


EE*a 
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mV /^i=*S xt*et, since S xtyt= 6, if terms in - are neglected. 

ti 

from (122) <r 2 M = mean of (/*,' — /i,)* * < , 




• • • 


• (12*) 


NOW cr a = ft, 

Hence increments 8<r, S/i of <r and /i a are connected by the 
equation 


2cr8cr = S/i a , Or Scr sss 


g M* 


Hence 


o a 2 » mean (So-) 2 = mean 


2yVa 

(W 8 


(125) 


4 m* 


cr 2 — — — -- from (124). 
4Ma” 


A similar analysis leads to the general result 


1 

= n (^ap ^Mp+i Mp-i 4 " P 2 h?~ 1 Ma — Mp 2 ) • • (126) 

K 

Hence the standard deviations of <r and all moments involve 


the factor and if n is large the difference between the apparent 
and true measurements is of the order ~= and may be neglected 


in formulae involving them. Consequently in evaluating the 
formulae 123 to 126 the calculated values of the moments n t ' etc. 
can be substituted for the unknown true values. 


Notice that the standard deviation of each moment depends 
on the moment of twice its order, and this higher moment 
rapidly becomes great as the order is increased. In practice 
it is found that with ordinary values of n the moments above 
the 4th lack precision for this reason. 

If the frequency curve of the universe is normal ji t = 3^* 
and 
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The first two results for the normal curve can be obtained more directly 
as follows. Let X lf X, . . . Xn be the measurements of n things taken at 
randon)|from a group whole frequency curve is 

t (*-* 0 )* 

y ~WTn e ' 2 "‘ ’ 

where x 0 and <r are unknown. 

P*, the probability that these particular n things will be selected, is 


(XjL-jr.ja 

2 a* 


X 


x . . .= — f 


Let.* fte the average of the X’s, X, = * -f etc., so that Sr, = o. 
log P« = — n log <r - ~ log 2TT - S {x - x 0 -f- x t } a 

=» - M log <r — - log 2 ir - ^ {f? (x -* 0 ) 2 + ns 2 }. 

where s is the standard deviation of the X’s. 

Here x and s are known and <r and x 0 are to be determined. 

P« is greatest when x — x 0 is least, whatever the value of <r. 

Give x 0 the value Jc 

Then P« is greatest when is zero, that is whe n 

o = b . ns f , and 9 — s. 

it rr 8 


Write 7 and x 0 = x -f S, and let P 0 be the value of P, when 

y as 0=8 8 . 

Then log P, - log P, = - » log - - ( 8 * + s*) + ", . s« 

S 2a* 


I . P* . 
-log=- = - log 




“ - J + - l]l ■ i *> neglecting 7 ', -,»• etc.. 


1 *_ «* 
S* ~ III 


- 4 -?’ _ l _>* _ 

p, = p, e w * .* vv/2 " ; 


. (128) 


Hence the errors in x and s are independent of each other, are of normal 

s s 

frequency, and their standard deviations are respectively - and — ■ — , where 

V» v 211 

s is the standard deviation of the sample, and for it a, the standard deviation 
in the universe may be written without perceptible error when n is large, as 
in the results already obtained. 
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Standard Deviation of the Correlation Coefficient. 

The standard deviation of the error *to which the Correla- 
tion coefficient is liable may be found as follows. (Here 
Dr. Sheppard's method is followed.) * 

Let there be pairs of values, such as (xt, yt), measured from 
their averages, whose standard deviations are cr v <r t and second 
moments X, /a. Let the whole number of pairs be N, and let be 
situated at (x + xt, y +yt), so that z x + . . . + z t . . . = 1. 

Also Szt#i = o = S ztyt. 

Write M«j for the mean of taken over all the paire, so that 
M,j as S zx'y 1 . Write M for M u . 

Then if r is the coefficient of correlation of the N pairs, 

M 

f = and log r = log M — J log X — | log /*. 

Now let a selection of n pairs be made at random, and let the 
number selected from the position xt, yt be [zt + et) n. 

If x\ y f are the resulting averages 

x' = S (zt + ei) xt = Sxtet , and y* = S ytet. 

The resulting deviations between the values in the sample and 
the values in the universe of r, M, X, /a are, by differentiating the 
equation for log r, evidently connected by 
Sr 8M . 8X 8/a 
r “ M 2X "" 2/a * 

Now 


8M = S(zt -f et) (xt — x f ) (yt — y ) — S ztxtyt = S xtytet — x' . S ziyt — y 9 . 
when products of any two of the small quantities et, x f , y\ which 

are of order are neglected. 

8M = S xtytet. 

As shown above (pp. 419-20), 

8X = S xt % et and 8/x = S yt % et. 


r \ M 2X 2 ix/ 


2/a/ 

Hence, from the general formula (122), if o> is the standard 
deviation of the errors in f. 


Or 






^ I I iix 
n \ M 1 ^ 4X 1 ^ 4 /a 2 


J£ 4 

‘ XM 


M 


19 


+ 


fiM 2 X/a 


2 vr 
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since 


*(¥7 =-$-- »-*-* 


where X 4 and ft 4 are the fourth moments of the distribution of the 
V x’s and N y’s. 

. In the case where the original distribution is that given by 
the normal correlation surface, X 4 = 3A 2 , /z 4 = 3^* M = Ya x <T %> 
M n = (1+2 r 2 ) (TjVj 2 , M 31 =3fcr 1 8 <r 2 , M 13 =3f<r 1 (r a 8 (formula (106)), 


and <r r * 

• • 

and 


__ r 2 ( 1 + 2 r 2 

n \ r 2 


1+ f-3-34 


I + 2r 3 


or = 


V n 


.... (129) 


This is the value generally used, it being implicitly assumed 
that the distribution approximates to the normal. 

The regression coefficient, when y is expressed in terms of x, 

is r — . = p, say. In the present notation p = and we obtain 
CTjj A 

by a method similar to that just used, 

< = 'I { M? + » “ 1m"} in any distribution - 

Hence in normal distribution 


~ n { r* + 3 J » ' r 4 


cr — . - y . V I — r 2 . 

p v n iT * 


In the case of normal distribution the result may be reached as follows. 
(Here Professor Pearson’s method is followed.) 

Suppose pairs x x , y x . . . x n , y n are chosen from a surface whose unknown 
centre is * 0 , y 0 , standard deviations <r x , <r t , and mean product 
Let 2, y, s x , s x , r' be calculated from the sample. 

The chance of concurrence of the n pairs is 

* g ( ( x t ~ * o) M (Vf-y o) 1 _ 2 r{x t -x 0 )(v t -yo) 1 
M ^ 2(1 ~ r'-0 l a!* <r t t 

(2*<r x <r t V I -- r a ) n 

log P f — - n log 2 irff x <r t - ” log (i -r 1 ) 

* / V_±/V , 'V _ 2Mf^s,s t + d x d. x ) 1 

, 2(1 — 1'*) l * 1 * <V <r x a 9 J 

where ^ x 0 — Jr, d x = y 9 — y. 

Here r , d,. #/,, d, are unknown, and r', s, ( s t known. 
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By expressing the conditions 


0P __ 3P_ 0P_ 0P_ ap 

dd x ° dd t ~~ d(r 1 8<r, dr 

we obtain the values of the five unknowns which make P g a maximum. 

d\ _ _ d 2 rd x 

*1* <t x <j 9 a t * <J l (T % ' 


whence d x — d t — o , unless r* = i. 
Then, taking d x and d % to be zero, 

I h\ _ , = 

(i-r'K'+d-rV.V. 

(I — r>) <r,V, 

and (i — r*) <Ticr a * 


i £»* , gjv » 

= (i - »- 8 )<r,' + d — »'*J 

• • 

= Vtr, — 

— — rr's l ^ i (T t , 


whence 

and 


— = - = k, say, and i — r* = k* (i — rr') 
<r l a 9 

* _ r _ f V , i«’ _ 2 "Vd , 'V* 

i — >* d — '■*)* Ui* + ”i a \°i f + d — 

r (i r a ) — 2rA a (i — rr') -f r' (i — r 2 ) k % = o 


Hence r =* r' and k — i. 

P, is greatest, therefore, when the values found in the sample are taken as 
the values in the surface. Write P 0 for the value of P, so obtained. 

Now write <r 1 = s x -f y x , <r % — 4 " y 9 , and r — r' 4 - p y and expand all functions 

in powers of the small quantities d v d v y v y t , p , neglecting third powers. 

We obtain 



I 

~ 2(1 - 

(*x* 

nw 

S 2 

2 r'd } d t 

S l S 2 

) + 

r' 

1 — r 

_ /P7i , P7t\ 

' 2 V Si + s, / 


r' 2 


2 — 

r '* / 7i 

1 

y 

i4r'* 

2 

+ I _ 

5,S f 

2(1 - 

OW, 

* -T 

*.v 

2(1-0 

a P 

I 

Jii 

r'd,\* 


2- 

-r' a 


y* '' Y 

2(1 ~r‘ 

'*) Vs* 

TJ ■ 

25,« 

2{l 


Vs, 2-0 

s t 2 ->'* r ) 


rv\*_ . 

2-f'*Vs, 2d -r'*)J 2(1 -O'' 


Integrate successively between extreme limits 

lor d v regarding d t , y v y t , p as constant, 
for d t , regarding y v 7,, p as constant, 
for y v regarding y t , p as constant, 
and lor y 2 , regarding p as constant. 

We then find that the whole chance of the observations arising from a 
value r* 4- p, whatever the values of x 0 , y 0 , c x , <r t is 


P= Ke 



t 1 

That is, the distribution is normal, with standard deviation of = 

sin 
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The work on pp. 421 and 424 shows that if the frequency 
group from which s^nples are taken is normal, then the chances 
of obtaining various errors in x , <r, fi and r in the sample are 
given by the normal probability function ; and inversely, that 
’the chances that the corresponding quantities in the universe 
have various deviations from the observed quantities are also 
so given. It remains to prove that under other conditions 
the same result is obtained. 

In each case the quantity concerned was put in the form 

• • F + / = a 1 {y 1 + *q) + a 2 {y 2 + e 2 ) + ... , 

where e x + e 2 + . . . = o, and / = o when o = e x = e 2 =■ . . . . 
Also y t +y 2 + • • • = i- 

The frequency curves of e v e 2 . . . are normal if n, the 
number in the sample, is large, p. 418. 

If e v e 2 . . . were independent of each other, or if the 
number of separate values of x v x 2 . . . were so great that we 
could treat them as independent, then we could at once apply 
the theorem of pp. 295 seq. and state the frequency of / is 
normal. 

The full analysis (given in Appendix, Note 9) leads to 
the result that normality may be presumed under the same 
conditions affecting the universe from which the samples are 
taken as lead to normality of the average, viz. : that the 

universe is so confined that the ratio ~ is finite for all values 

or* 

of t. (p. 299). 


Note added in 1936. — It should perhaps have been more explicitly 
stated that the methods of pp. 421-4 do not relate directly to the same 
problem as those of pp. 418-20. The last-named supposes many samples 
from one universe, the first considers the probability (or likelihood) 
of a given example from various universes. The forms of the results 
tend to be the same as n increases ; but the treatment of the equations 
(128) and at the bottom of p. 424 as frequency groups involves d priori 
probability, as indicated at the top of this page. The problem “ given 
the target how will shots be dispersed ** leads to Prof. R. A. Fisher's 
method of “ variance " ; the problem “ given the shot-marks, what was 
the target," which is the practical question in many cases when we have 
only one sample, necessarily involves the reference to d priori prob- 
ability. 



CHAPTER X. 

TESTS OF CORRESPONDENCE BETWEEN DATA* AND 

FORMULAE. 

In the general method of the representation of observa- 
tions by a mathematical formula, the question must arise 
how the adequacy of the formula is to be tested, or, as it is 
frequently phrased, a test of the goodness of fit is required. 

Consider for example the table used above (p. 310) of the 
weekly expenditure on food per “ unit ” in 970 families. 


Expenditure. 

m' 

number of 
cases. 

TH 

calculated 

numbers. 

difference. 

Standard 

deviations. 

£* 

m 

Not exceeding 5-5 5.. 

18 

22 

4 

46 

*7 

5'5 • • • • 

IO7 

123 

16 

10-4 

21 

7*5 • 

255 

234 

21 

13-3 

1-9 

95 ... 

245 

249 

4 

I3 o 

•1 

n*5 .... 

173 

168 

5 

n*8 

•1 

13-5 .... 

IOI 

89 

12 

90 

1-6 

15-5 .... 

38 

51 

13 

7 ‘° 

3*3 

175 .... 

17 

22 

5 

4*6 

i-i 

19-5 .... 

9 

II 

2 

3‘3 

'4 

Over 2i«5 . 

7 

I 

6 

T 

360 

Totals . 

970 

970 

88 

— 

47*3 


The calculated numbers are from the second approxima- 
tion to the Law of Great Numbers. A rough method formerly 
used was to add the differences between the calculated numbers 
and the numbers observed in each compartment, irrespective 
of sign, and to express this total as a percentage of the number 
of cases. The “ percentage misfit ” thus calculated is 
88 -r 9*70 = 9*i per cent. 

The weakness of this method is that it is not related to 
any measurement of probability, and one cannot tell at sight 
whether the fit is good or not. Of two competing formulae, 
the presumption is that that which gives the lower percentage 
misfit is the better ; also when we have several sets of similar 

426 
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observations we can tell* roughly by this method which is 
nearest to the fonjiula, and in some cases in which set the 
observations are most regular. 

The percentage misfit is generally diminished if compart- 
ments are merged together. 

As regards the contents of individual compartments, we 
already have a simple test. If m t is the calculated number in 
a compartment when there are N observations in all, the 
chance of finding m t + e t observations in this compartment in 

a random selection is 

• • 

e a% (formula (19)) where a % = ^ ^1 — ^^N, 

and the probability of exceeding any assigned multiple or 
sub-multiple of a is given by the table (p. 271). The standard 
deviation for each grade in the above example except the last 
is given, and it is seen that four out of nine errors are less than a, 

their standard deviation, two are between a and and the 

remaining three less than 2a. No separate measurement is 
improbable, and therefore the whole grouping may be presumed 
to be not improbable, except the final number, 7 above 
2I-5S. 

That numbers in extreme grades should be discontinuous 
in relation to middle grades is common in many classes of 
observations. 

The deviations are not independent, however, since their 
total must be zero ; and even if the deviation in one compart- 
ment taken by itself is improbably large, it may yet not be 
improbable when all the compartments are considered. A 
measurement which allows for this modification has been 
devised by Professor Pearson, and part of the analysis in a 
simplified form, a brief table of the results, and some 
applications are given in the following paragraphs (see The 
Philosophical Magazine , No. 302, July, 1900, pp. 157-175). 

Suppose that a formula, which is presumed to represent 
the distribution of observations, leads to the expectation of 
m v m t . . . m n observations in n grades or compartments, 
wheji N, =* m 1 + m t + . . . + w n , is the whole number of 
observation*. 

In an experiment or group of observations, suppose that 
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(m l + «i) • • ■ (m t + £<)••• (w„ -f £„) are found in the compart- 
ments, so that e x + ... +£( + ... + e n *= o. 


,,, , *n, , mt 

Wnte ^ = 

Then pt is the chance that an observation from a group satis-, 
fying perfectly the formula will fall into the t th grade. 

The chance that mt + et will fall into this grade when N are 
chosen at random from an indefinitely large universe is 


1 


<Tt 


V2 IT 


e 


2<r t * 

» 


where cr t * = />i(i — pt) N — ptqtN, where qt~ 1 — pt. 

It can be shown that the joint chance of the errors named is 

Ktf-** 2 , where x 8 =S.--, and Set = o, 
mt 

K being a constant. 

For, if there were only two compartments, e x -f e % = o, and the 
joint chance equals the chance of either. 

Then = ? = ™ J , w 1 + m, = N. 

The chance is 


- , N 1 since + 

v 2Trm l m 1 m x m 2 

If there are three compartments 

m* 

fi + e a + e a = o, m, + m t + m, = N, o-j 8 = 1 . 

and similarly for or,* and <r 3 2 . 


and e x * = e t 2 . 


M 

N 


2« x £, = £3* — £,* — £,*, 

ro^tr, — mean e x e t = J (cr, 8 — o-j* — a-, 8 ) 

“ ~ — j^- (Compare p. 419.) 


The chance of the concurrence of and e t , and therefore 
of also, is given by the normal correlation surface as 

1 '»!* »,» <r,<r, > 

2ir<r 1 <r i Vi — r* 
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_ 2 _ */ T *,2\ _ + tn a ) (tn t + m ,) m^nu* _ 

ff i <rj (i * ) 1 jjj — — Jj — * 

since m, + — N. 


Hence the index of e is 


2m 1 tn 2 tn i 


( e l t(T % 2 + e i i(T i t — 2r<r J <r s e 1 e i ) 


e x hn % (m x + tn z ) e 2 hn x (tn 2 + m 8 ) 2^^^ 


2tn 1 m 2 m 3 \ N 


N / 


2 m l m 2 m z 


{(e x + e 2 ) 2 m l m t + efm#n % + e£m x m^ 


= “ Km* + 2* + ty Since + e * - - e * 


Now if the second and third compartments had been merged 
into one containing M -f E observations, where M = m 2 4 m 2 and 
E = e 2 + e 2 , the chance would have been 

where K x is a constant. 

The effect, therefore, of dividing the second compartment with- 
out changing the first is to alter the constant and to replace 

T?2 p 2 P 2 

— by — -f- 5 - in the index. 

M J w a ' m a 

Similarly if two compartments are given, the effect of dividing 

the third compartment without changing the first two must be to 

t 2 e 2 e 2 

alter the constant and to replace — by ~ + — in the index, and 

th 3 m 4 

so on. 

Hence for n compartments the chance, P, of errors e v e 2 . . . e n is 

Ke-* x \ where x*= — + — + . . . + — , 
m x m 2 tn n 

and C\ + + • • • = o ( I 3°) 

Notice that x* is the same expression as is used in obtaining the 
coefficient of contingency. 

[A proof of the formula, without the above method of 
induction, is given by Pearson, by the use of the multiple 
correlation equation. See also Note ii, p. 454 below.] 

If the sdlections in the compartments had been independent 
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and without the condition that e t + e t + . . . = o, the chance 
would have been 


x 


e 


-*( 


N - mi 


+ 


N - m§ 


+ 


• ) 


o 


for the index would have been 

” + • • -) = ~K4 + N I ^m 1 + • • •)• 

If there are many compartments and the largest of the 

ffl* 

fractions is small, the second part of the index is negligible 

compared with the first, and the two expressions tend to 
equality, and the effect of the correlation is small. 

The chance of the occurrences if there is no correlation is 
less than that when there is correlation, since the last factor, 
if not negligible, is less than I. (The constant is eliminated in 
further processes.) Hence the aggregation of uncorrelated 
chances, which is simpler than the present method, gives an 
unduly unfavourable view of the appropriateness of a formula. 

The chance of every system of errors that gives a particular 
value of x* is the same. Now, when the probability of a 
deviation from the mean in normal frequency is in question, 
it is customary to measure the probability that so great a 
deviation to left or right should have occurred, viz., 

2 f e-to'dz. 

J « V 2 tt 

Similarly here we may measure the chance of the occurrence 
of the system of errors or a less probable system by evaluating 

zJJ . . . Ke-w'd x , where d x is written for de^ . de^ . . . de n . 1 

and the integral is n — i fold and extended from x ton, with 
the condition + «» = o, K being so chosen that 

J" K«-» x ’i x = i. 

The existence of this condition makes the integration 
complicated, and reference should be made to Pearson's 
original analysis for its working out. 

The result is that 


P= Vi [V«\ i, + VI «-«‘(y+^+. . .+ *"•_ } 

v »Jx y * \i i*3 x.3 5— n— 3/ 
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when » is even, and • 

P as e-* x (1 + +1. . . 4 — — — — ) when nis odd. (131) 

A table of the values of P for various values of x* and n 
is given in Biometrika, Vol. I, pp. 155 seq. We can, in a very 
brief form, obtain a working rule for determining whether a 
formula does or does not adequately represent an observed 
group by picking out values of x* which for a given n make 
P = | or slightly more, or, further up the scale of improba- 
bility, ,make P = *0455 or slightly less, which corresponds to 
twice the standard deviation in the normal curve. 


n. 

X*. 

P. 

X*. 

P. 

3 

I 

•61 

6 

•050 

4 

2 

*57 

8 

•046 

5 

3 

•56 

10 

•040 

6 

4 

•55 

12 

•035 

l 

5 

*54 

13 

•043 

6 

*54 

15 

•036 

9 

7 

*54 

16 

•042 

10 

8 

•53 

18 

035 

11 

9 

•53 

19 

•040 

12 

10 

•53 

20 

045 

13 

11 

•53 

22 

*038 

14 

12 

*53 

23 

•042 

15 

13 

•53 

24 

•046 

16 

x 4 

•526 

26 

•038 

\l 

15 

•525 

27 

•041 

16 

•524 

28 

•045 

19 

1 7 

•523 

30 

•037 

20 

18 

•522 



25 

23 

•520 



30 

28 

•518 




If X* < n — 2, it is at least an even chance — as likely as not — that the 
observations would be found from a group represented by the formula. 

If X* > 2 n, the improbability is considerable. 

Strictly, the test should be applied using as many compart- 
ments as are given by the observations, for the merging of 
compartments affects the resulting value of P ; but it is often 
difficult to get back to ungraded observations, and in the case 
of continuous variables, such as height, the original grades 
would be as fine as the measurements could be made. 

A more serious difficulty is that in any compartment the 
observed vu + e t must be integral, while m t is in general not 
integral, and some value of e t would be found in the most 
perfect representation. In consequence, the number to be 
expected in the least occupied compartment must be reasonably 
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large, "or we obtain spurious contributions to x*. This in 
practice rules out detailed extreme compartmenl s, and in their 
rejection or fusion an element of arbitrariness is introduced 
and no fine measurement is possible. 

On the other hand, when we are testing the applicability* 
of the normal curve of error, or the general law of great numbers* 
based on Edgeworth’s hypothesis (pp. 298-9), there is no expecta- 
tion of closeness of fit on abscissae beyond a small multiple of 
the standard deviation — the smaller as the number of inde- 
pendent elements that contribute to the measurement 
diminishes — so that the test is only applicable to tho well- 
occupied central compartments ; but in choosing the extent 
over which the test is made, the fineness of the method is lost. 

Hence, only a broad, but often sufficiently definite, result 
can be obtained. 


Illustrations. 

If we neglect the extreme grade in Example 7, on p. 310, 
x* = u*3, n = 9, P = -18, and the formula “ 2nd approx.” 
is adequate. 

If we take the Pearsonian formula, on the same page, 
x 1 = 2i*4, « = 9, P = -006, but if we exclude the lowest as 
well as the highest grade, x* = 4*1, n — 8, P = -77 ; hence 
this formula expresses the central eight grades but not either 
extreme. 

The same conclusions are reached if we simply take the 
standard deviations of the grades separately. 

In the table on p. 309 relating to the ages of school children, 
n = 8. The normal curve gives x* = 167 and P = -02, 
which is not satisfactory. The second approximation, how- 
ever, gives x* = *47 and P is indistinguishable from 1. 

In the experiment on the numbers of letters in words 
(pp. 305-6), the sum of 10 words, graded by 5 letters, gives 
« = 13, and with the normal curve x* = 33, P = *001, or 
omitting the lowest and two highest extreme grades, n = 10, 
x* = 6*i, P = 73. The second approximation, however, 
including all grades, gives x* = 8-4, P = 74. 

The sums of 100 words graded by 20 letters give* n = 10, 
x* = 2*96, P = *965 with the normal cur ire, and no further 
approximation can improve on this. 
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An example of a different kind is found, when a distribu- 
tion found by sample is compared with the whole group from 
which* the sample is taken, to verify the rules of sampling or 
the adequacy of the method. 


Number or Companies Paying Dividends at Various Rates. 



Number in 

Relative 
numbers in all 

Standard 



sample 

m . 

companies 

m. 

deviation. 

m 

Below 3 per cent. 

: .11 

30 

5*3 

*53 

3 per ceqt. 

4 »»' .... 

108-8 

89 

0 

”7 

1244 

9*3 

•44 

5 

6o 

708 

7*4 

165 

6 per cent, to 8 per cent. 

. 48 

43 a 

62 

*53 

8 per cent 

33 

22-8 

4*6 

4*57 


400 

4OO 


7.72 


Here n = 6, X 1 =. 7-72, P — *185 The result is fairly good but spoilt 
by the highest grade. 


This test has been applied to the distribution in two dimen- 
sions, in the experiment tabulated on p. 394. 

The 24 squares, 3 to left and right of centre, and 2 above 
and below it, which contain in theory n or more observations, 
were taken as separate compartments. Outlying squares were 
grouped in the 9 regions shown by the thick lines, rather 
arbitrarily, so as to get contiguous squares which aggregated 
to at least 9 expected observations in the second approxima- 
tion. The results are as follows : — 


24 central squares 
9 outlying regions 
regions . 


Normal surface. and approximation. 


X*. 

p. 

x*. 

P. 

20*8 

•59 

175 

*79 

27*8 


IO'I 


486 

•035 

27*6 

•59 


The improvement in the outlying regions by the use of 
the second approximation is very marked. 

Note . — In application of the test to double or manifold 
tables, as those on pp. 372-3, the procedure is different 
according as the sub-totals n 1 . . . m t . . . are supposed to be 
given or not. See Economica, No. 7, p. 1, and No. 8, p. 139, 
and the Statistical Journal, 1922, pp. 87 seq. 


F F* 
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MATHEMATICAL NOTES. 

* 

X. — Wallis’s Theorem for the Value of -it. 

By simple graphic considerations it is evident that when n 
is a positive integer 

Z w w 

j* sin in + l x.dx< j~ sin 1 ”* . dx < j~ sin tn ~ l x.dx 

, a. 4.6. ..2 n ^ i.3.5...(2n — i) «-^ 2.4.6...(2n— 2) 

*’ 3-5-7---(2’» + i) < 2. 4.6. ..2 » 2 3.5.7. ..(2n— i)* 

2*"(» !)* (2») 1 tr 2 *”(»!)* I 

(an + i)l < 2*“(»l)*2 < (in) 1 '2 ft' 

2»(« I) 1 2«"(wl)» 

(2n)IV2» + I < ^ 2 < (2«) I V 2 M " 

fZ 2* B (fl!)* I . . 

\/ -= , — r , -7= . correct to - (132) 

V 2 (2ft) 1 V2» n ' J 

(See Gibson, Treatise on the Calculus, 1896, Ex. XXVI. 22.) 

2 . — Sum of Powers of Integers. 

If we suppose 

<»« 

S» = 2 ^ “ = + fiftt* - -f cm?' 1 + ..., 

1 - 1 

we can find «, b, c . . . by induction. ' 

For (m + 1)' *» Sm+i — S» *» a {(m + 1)' +1 — mr* 1 ) » 

+ &{(»* + i) f — w\ + c{(» + i)' 1 ' 1 — m' -1 } . . . 
434 
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Equating coefficients of m r , mr ~ l , .... we have 
i = «(r + i) 

• * r = a — — — + br, and 6 = 1 

2 

f(r — i) (r + i)f(f — i) , . r(r — i) . , , . r 

. 2 ' 6 ^+6-^ r -^ + c(r-i).andc = - etc. 


2 

I 


• I f +2 f + +WE I ■ 

w +1 S— f -f- I 2f» ' 12m* 

*= — i-, if i is neglected, 

«*. r t I m 


12 


’ r XT + w — * fe neglected .... ( 133 ) 

Y - f- I 2m fW* 


3.' — Stirling’s Formula for m ! 

The first approximation to this formula may be obtained 
from Wallis’s Theorem as follows. 


(2 m) I 


Write 

* - - 1 ) r (2m - I )( 2m -2)-..(2w-m) + (2m)». 

Then 


7 j v 


1 + 2+ +w , , I f +2 f + 

— log z = — r • • • 


+ wr 


2m 


r(2tn) r 


by Note 2, if higher powers of - are neglected. 

fti 


Now 2 


2~r 


f (r + 1) 


S 2 --2S^ 

r " r + 1 


Hence 


• log (I — 1) + 2 {log (I - i) + II = I — log 2 — log ( t 


- log* a mlog(|) - Jlog (i - 1) 4 - j-jjj 

and x = . 2 ~l . e 1S * = •• •) 


* ya 2 m ~> . *“ m , if — is neglected. 
tn 


r f*» 
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But by Wallis’s theorem 

_ Ml (M)* /;V correct to i . _ , x MLg <C 

2 2m . m ! \2 / m 2”*" 1 « 

m! = m m . Viwtn . (134^ 

This formula gives an error of less than 1 per cent, for thfe 
value of 10 I and rapidly reaches considerable accuracy if m 
is increased. 

In its more complete form it is 


m 


- m + 1 - * + 

I = m m . V 2 rrm . e 1,m 860 ” 4 * 


(See Chrystal's Algebra , Chap. XXX.) 


4 . — The Euler- Maclaurtn Theorem , which connects Summation 
with Integration . 

Let /(a), /(<* + h), . . ,f(a + mh) be values of f(x) at m + 1 
successive values of x. 

Then by Taylor's expansion 

/(« + A) -/(«) + VW + + • . • 

/(• + 2h) =f(a + A) + A/'(a + A) + ~f’( a + A) + . . . 


/(a + mft) =/(a + m — i A) + A/'(a + m — ih)+ -/'(a + n» — iA) + . .. 
Write d = a + tn — ih, and b — a + mh, and add. 

m -/(a) - A 2/M + £ . 2 V(*) + fj 2/'W + 

where F (a) = /'(*), and j F (x) . dx =/ (*) -f constant. 

Similarly 

a - fjw* - 7 2 Vw 

and hjfrw- jy^Md*-.., 
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Combining these equations, wc have 

*£¥(*) = f/(*) .*x- h -{F(b)-F(a)} + £{F'(b) 

9 + terms involving A 4 , 

and /• 

h 2 * F (*) - jy (x) . dx + \ \ F (b) + F (a) } + J* { F' (6) - F' (a ) } 

+ terms involving /i 4 (135) 



In the figure let OA represent a, OB b, and AA' and BB' h. 

AB = mh. 

d 

AF (a), AF (b) are the rectangular areas on AA', BB'. h 2 j F (s) 

a 

is the sum of the rectangular areas on AB. 

f F (x) . dx is the curvilinear area on AB, and the term 

J a 


[F (a) — F (b)] is a first approximation for the defect of the curved 
from the rectilinear area. 


Some difficulties arise in applying this theorem to the 
curve of error. 

1 - — 

Here F(*) = — 7= * 2 » f , and, when x is not great compared 
a v 2 tt 

wither, should be represented by a finite vertical length. 

- =*= — and - 1 .-, should be finite vertically. 

<r Vpqr* Vn 
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Tie horizontal distance AB — mh should be finite. It is 
found in the analysis on pp. 265-8 that the number of successes 
above or below pn must be considered as of order Vn l ’heqpe 

m is of order Vn and h (the unit step) is of order I9 

other words, in the drawing the rectangles must be supposed 
so thin that it takes a number of them comparable with Vn 
to give a finite breadth. 

1 » 

In the equation (135) then A is of order ~^>^£F(x) contains 

b 

Vn terms each finite, and therefore hY\ F(x) is of order F(x), 
as is J" F(x) . dx. 

The following terms on the right-hand side are successively 
of orders i etc. (F'(6) is of course a simple numerical 
ratio.) 

Now give h its value unity, and we have, for the normal ( 


curve of error in which terms of order 
aggregate chance of successes from 


are neglected. 


pn +x i to pn -\- x t 
In the next approximation 



P. 



(136) 


where terms in we retained and terms in i neglected. The 

h term in the formula (135) must be retained. 

The result is most conveniently given as the sum of the 
chances of successes from pn to pn + x) for this purpose 
suppose A in the figure to be a half-unit to left of G, where 
OG = pn and G is the abscissa of the centre ot gravity of the 
curve (p. 437). Then let GB = x. 

Sum of chances from G to B = sum from A to B — $ . P 0 

. f'P..ix+i(P,+PJ-tP 0 . 

J 0 


Write x =» x*. 
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Hence the sum of chances from 

*vhen p is neglected. 



( 137 ) 


Here <r = Vpqn, * - 

(See Todhunter’s History of the Theory of Probability, 
Art. 993.) 


> 


5. — Dr . Sheppard's Corrections for the Moments of Frequency 

Curves . 


(See Biometrika , Vol. III., pp. 308 seq.) 

Let y=/(x) be the equation of a continuous curve of frequency, 
whose area is unity. 

h 

Let A P be the area standing on the base x v ± p being integral, 

2 

and let the values of A p for all values of p be known from the 
observations. 

The moment computed from the equation of the curve, 
say mt, = I x* .f(x ) . dx t where a and b are the extreme values of x. 

J a 

The moment computed from the observations, when each 
area is taken as concentrated at the middle point of its grade, 
ft 

say Ml. = y. . V • A,. 

Required to find what correction should be made to ^ to 
obtain mt. 

+ h h 

A,— / */(*) dx= l* h f(x, + x) .dx 

J *»- * J -» 

* 1 

- f\i/M + */'(*,) + YfPM + ...}■ dx 
. - V(x p ) + £ f* (*p) + ~ / 4 (*,) +. . . . 

Hence 

- 2/V/(*p) + S* ^ + S* ^ V/ 4 (*p) + ••• 

= by the Euler-Maclaurin theorem (formula (135)) 
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/***/(*) • * x + \wm + «•/(«)} + £ [D(^«/W)]‘-~ [D»(*‘/{*))]‘ 
+ q- /V (*) <**+^ {W) + «</ 2 («)} + ~ p>(**/* (*))i* * • 

A 4 Z* 6 f 

+ J **/ 4 (at) . dx -f terms involving A 5 . 

Now restrict the investigation to the case where the curve 
touches the axis at both extremities, so that 

/(«)- o-/(6) =/’(«)=/' (b), 

and let the contact be so close that also 

f 

A 2 P (a) = A V 2 (A) = o, and also A 4 / 8 (a) = A 4 / 8 ,6) = o, 

and in all these cases let the presence of a multiplier such as a *, 
not make any significant difference from zero. 

The expression reduces to 


* = /***/(*) •<** + \ b /P(*)*x + ~ j/f * (*) * 

+ terms involving A 5 . 

Then 

f\t/*(x)dx = [*»/'(*)] - t j *•-»/'(*) dx = i(t — i) m,_„ 

•'a a J a 

and f x l P(x)dx = t (t — i) (t — 2) (t — 3) m<_ 4 , by continual inte- 

a 

gration by parts and use of the conditions. 

Since A is generally small in comparison with the moments, 
terms involving A 6 can be neglected. 

+ (l - 1) m,_* + j— i(t - 1) (< - 2) (t - 3) m,_ 4 
approximately. 

Giving / the values 0, 1, 2, 3, 4 in succession, we have 


p 0 s w 0 = area of curve = 1. 

= zero if the equation is referred to the vertical through 
the average. 




:m a + 


A 8 

12’ 

A* 


f t s = m l + * 7 m 1 = m 8 if m^o. 

4 

,A 2 , A 4 A 2 / A 2 \ , A 4 

^-m 4 + -*,+ g 5 »*i t -« 4 + -(F,- l5 ; + g5 . 
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»»i = . 


h 1 , 7 /i 4 

W 4 = U 4 Wi T > 

4 r-4 2 rz 240 


and w x =s fi v w, = /x s when the moments are taken about the 
vertical through the average. 


6 . — The Moments and Constants of the Second Approximation 
to the Generalised Curve of Error . 

The equation to the curve is 

I* 

I " jji f K (X I 

V-i A-S—3S*)}- 

. °o X % 

Write m p for / — “ e 2 * % . x?dx. 

J - oo SV27T 

Then w 0 = i, m x — w 3 = . . . = m 2p+1 = . . —o , m 2 = s 2 , 

= i . 3 . 5 . . . { 2 p — i) . 2 p (formula ( 23 )). 

Write for the p th moment of the second approximation. 

Then 

M p = f 7 e 2 si .xPdx—~f — \^e 2s% .xP + x dx 

J -•SV27T 25 7 - oc c V27T 


2$ J - « sV: 2 tt 

+ a/ -4= r 5 *.^** 

65 8 J s \/ 2 ir 


mp ~ 2 S W/>+1 65 ® ‘ Wp + 8 ' 


M 2p = w 2p , since Wap +1 = o = w^ 8 

and therefore even moments are not affected by the inclusion of 
the k term. 

M 2 = s 2 ( 140 ) 

Mq , +1 = - 2 ^w 4P+2 + ^5 • w 2pf4 , since >»„,+, = o 


m x = - - 3*^*) = - tv* - r* • 3s4 ) = ° 

■“•-- 2 - s ( 3s, -?- I5s ‘)- = *’-' 
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*£r+i ~‘~ 7 s’ 1 ' 3 - 5 •••(*+ 1) ■ ** + * j 1 - ( 2 ^+ 3 )s*} 

+ *(i4^) 

o 

The origin is the average of the curve, since M, = o. 

To find the mode we must equate ^ to zero. 

"•W- -5? +1< *{ I -3(1-3?)} - -S-;(i -0 

I *" ? 

since ** is of the order - and neglected in the analysis of p. 295. 

fi 

1 dy x k / x 2 \ . 

(I «> 


whence x *= — t*s, neglecting * # . 

distance that average is to the right of mode 


i* • (144) 


Then area of the curve standing on the base ON, where 
ON — * = xs, is given by 



= F(*) -«/(*), 

where 

F(*) = -£=f e-Wdx 
V2 wJ 0 

and 



These functions are tabulated on p. aji and p. 303. 

Y # -F (*) + «/(*) 

-• 

and the whole chance from — - * to + *, Y , i$ iF(z), as in the 

-» 

normal curve.* 


• The corresponding formula from the p, q, n hypothesis, using the Euler- 

Madaurin theorem, is aF(t) H =; but when the data are continuous 

$V 2v 

the last term drops out 




EXAMPLE OF THE SKEW CURVE OF ERROR. 



tfvtur* 
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If *M is th« position of the median .we have 

0 oo 

t + area on MO -* Y , and J — areaf on MO =* Y f ^ 

.\ 2 area on MO = Y° - Y" =«{/(- oo) +/{»)}«= « 

- * 0 3 V 2ir 

The ordinates on the' small base OM differ from the ordinate at 

O, viz., — only by terms involving 
S\ 27 r 

Ik 

.\ 2MO X — 7=r = — 7=, when k % is neglected, ** * 

SV27T 3V27T 

and MO = — § (distance from mode to average) . . . . (145) 


Let the area on MN, where M is the median and N any point, 

equal the area of the normal curve on ON x , i.c. t where 

x x = ON x , and let NN X = v, where v, as can be seen in the follow- 
ing analysis, is small and of the order #c. 

ON = x x — v. 

Then 


F = area on MO + area on (x A — v) 

B ~\ — i— r 

6V 2t r sVzttJ 0 




+ F ft) - S vs • '*- &Z + 6 Vs( r - 7) * 

and vk are negk 
ks / x x *\ 

V ~ 6 ( X 7 8 ') 


K 

6V27T 

where terms of order v 8 , and v* are neglected 

■ 2 


The average, s, and k can be obtained if we know the 
relative number of observations from the lowest to each of 
three positions on the horizontal scale, and if we can assume 
the equation of the frequency curve is that here in question. 

The method is most readily explained if we take a numerical 
example. On p. 309 we have * 
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Limits of Age. . Number of Children. * 

0 to 13 years *296 of 3044 

t * o to 1$ „ -867 „ „ 

0 to 16 „ -969 „ „ 

r 

# Let m be the median age, s years the standard deviation, and 
ks* = third moment, all unknown. 

In the figure above let M represent the median age and N the 
age 15 years. 

The area on MN is then -867 — *500 = F (1*112), (p. 271). 

H*$ce 

« 

ON* = 1*112 = — = 2, 

1 s 1 

15 — m = MN = MO + ONj — NN t = + x t - Jks (i - ^ 

15 — rn = z x s + where z x = 1*112. 

Similarly 

16 — m == z^s + Jks** 2 , where z t =» i*866, 
and 

m — 13 = z s s — Jksz, 2 , where F (* s ) = *204 and z % = *536. 

A little consideration will show that the negative sign 
must be taken when N is to the left of M. 

We have now three equations for determining m, $, and #. 

ifcsfo-fa) + s = — T"7 

~r *8 

Jks (* 2 -s t ) + s = —- j --- 

*» -r *a 

<ts = *278 s = 1*187 k — *234 m = 13*623 
Average = m + Jks => 13*669. 

(Compare Statistical Journal, 1902, pp. 339 to 348.) 

From moments depending on the whole nine grades, it was 
found that s = 1*190, k — *206, and average = 13*665. 

If the average or median is known, or if the curve is known 
to be normal and * = o, two observations are sufficient for 
determining the remaining quantities. 
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Notice that the curves representing the first and second 
approximations intersect at * ■» sV 3. 

The area of the skew curve standing on MN — the acea pf 
the normal curve standing on ON when * ■* db s. 

The excess of the skew curve over the normal curve foi*> 
any distance ON to the right — the defect for the same distance 
to the left. 


7 . — Ratio of Unweighted Averages. 


Let M,, M, . . . M» be the true measurements of n quantities at 
one time, and M 1 ', M,' ... of similar quantities at another time. 

Let nm=SM ( , Sm< =• o, nth' = SM»', M»' = m' -f m<'. 

Sm/ — o, no’m* = Sm ( \ tur m * = 

Let in ' «■ m (1 -f p), M,' = (1 + e -f ««)Mi where Sm« «= 0, and 


P = « + 


SnttUt 

nth 


Here u measures the mean of the ratio increases of the 
quantities, and p measures the ratio increase in the mean of 
the quantities. These tend to be equal, if the larger 
quantities are not on the whole subject to the larger 
increases, or conversely. 

Suppose the quantities to be erroneously measured as 
M,(i +«,) and M/(i + «*') etc. Then by formula (70) the standard 


deviations of the errors in m and m' are ^V( I+5 3 v and 

yj(l + where <r and * t are typical of e% and e,'. 

If the errors in the two sets of measurements are inde- 
pendent of each other, then (by p. 318, formula (63)), 


Sr* = 




where s, is the standard deviation of errors in — , i e. in 1 + p. 

frt 

It is frequently the case, however, that the error e% in 
the measurement of M' t is of the same sign and not far from 
the same magnitude as 4, the error in the earlier measurement 
of tite corresponding M«. 

Write * = 
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Then if e is the resulting .error in the ratio of the average 


JV‘> 


S{Mt'(i -J- et')\ 

S{Mi(i + et)} 

S{Mi' (i + e't)} . SMi - S{Mi(z + et)\ . SM'i 
S{Mt(i+et)}.SW 

mS(MiV) — m' . S(Mi#) 
m'S{Mi(i + *)} 


_ mS(Wdt) + S j(mW - m’Ut)et) 
nmrti * 

neglecting «* and etet', 


S (Wdt) S{ffn + g-p)Mift} _ S(Wdt) 
nm' nm' nm ' 


SMtttttft 

nm 9 9 


if u — p is neglected. 

Hence if s r is the standard deviation of e t and cr*, o* the 
standard deviations of and e u or their weighted standard 
deviations if they are not all from identical frequency curves, 

Sr* = • S (M/*) + ^ . <r* . S(M,*u,*), by formula (55), 

if d, and e, are uncorrelated. 

Now S(M/)* = S(w' + m,')* = « (§»'* + <r m *). 
and S(Mi*««*) = «* + + S«<*(m«* — «r m *) + 2mS(mtMi*), 

where «a„* = Sm,*, 


Sr* 


«o»* (m* + <r m *) + terms which tend to be negligible. 



I 

(!+/»)* 


approx (146) 

If e, and e% were independent cr£ would equal <r* + o-,* 
while if e t =• e/ etc. a* would be zero. Hence a d may be 
regarded as between o and <rV 2. • 

The magnitude of the second term depends on «•„, which 
measures the variation in the rates of increase of the different 
quantities, and is known from the observations. 

Hence if similar errors are made in observations at both 
dates of quantities which increase at nearly the same rates, 
the error in the ratio of the computed averages is small, and, 
if « is great, very small. 
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8 . — Ratio qf Weighted Averages . 

In the case of weighted averages the formula becomes mofe 
complex. 

Let Wt = w + wt t and W t wt be any pair of weights at 

the two dates, where w, W are the averages of the weights. Write 
narw 2 = S wt 2 and ncr w % = Swt 2 . 

Let Wt = Wt (i + v + vt), where S vt — o, and write n<r 0 * — Svt *. 

Suppose Wt(i + >7<) and Wt'(i +??(') to be taken in error for 
Wt, Wt, and write cr' for the standard deviation of rjt. ^ 


T . - S(WeM t ) , - , S(Wt'Mt') 

Let mu, = and «. = ' 


Other letters have the same meaning as in the previous note. 

Required the error in say e . 

m%o 

\ _ S{W/ (i + Yjt) Mt'(i + &')} S{Wc(i + ^)} 

+ S{Wc(i + ip)Mt(i + et)} ' S{WY(i + rjt'jy 

and hence, after reduction in which products and squares of vjt , e% 
are neglected, 

^S(Wt'MtV) S(WMtet) S{W<(w„- MtM S{Wt'(hi„'-M t'h't} 
C 83 tm'mj nwm w nwrhw nw'ih' w 

( 147 ) 

To obtain approximate results neglect all sums of products, 
where the sum of the factors of one kind is zero. This leads to 
taking m»=m, m*' = in', a = p, w' = (i + v)w, and to further 
simplifications in the reduction. 

Write dt f = **7*' — and <r& for its standard deviation. 


Then e 


S (WtWdt) SJWe'Mtui) S(W Mtvt) 

nw'ih' nw'ih' Ct nw'ih 61 

S(Wt'mt'dt') S(Wt'Mm) S(Wmevt) 

* nw'ih nw'm' *** nw'ih ***' 


Hence approximately 
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The terms involving <r*, a d * (which measure the errors in 
the quantities M,) are similar to those when the average is 
upweighted, except for a factor (greater than i and generally 
less than 2) involving weights, and a term involving also the 
small factor <r t * which measures the variation in the change of 
weights. 

Of the three terms involving <r' 2 , a d 2 (which measure the 
errors in weights) the first and the third contain the factors 

(jjp) and respectively, which are small when the M’s 

are little dispersed, and the second involves which is 

small when the rates of increase of the quantities are nearly 
equal. 

The actual values of all the coefficients of <r, a', a dl <r d can 
be obtained from the observations, and their relative importance 
discovered ; but we can say without evaluation that when 
quantities little dispersed increase at rates not far from equal, 
errors in weights have little importance as compared with 
equal errors in quantities. 

In such cases a first approximation would be 

s '-^V( i+ (t)’) (i49 > 


but if 


- is not small, a better approximation would be 
1 + u 


! '-^V[( I + (t)’)k + (^)V + ''*)}] • too) 

It is seldom that cr d , which measures the difference of errors, 
is small compared with one error, though it is likely to be less 
than V 2 .<t. 

It is advisable to test the coefficients roughly from the 
observations before neglecting terms ; and also where there 
are any signs that the neglected products are not small, or 
any of the errors are likely to be specially large, the unabridged 
form (147) should be used. 

(See Statistical Journal, 1911-12, pp. 81-88, ** Measure- 
ment of the Accuracy of an Average.”) 
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9 . — Normality of Standard Deviations of the Errors in Moments, 

etc . c 

* K 

[Based on Sheppard's “ Application of the Theory of 
Error/' Transactions of the Royal Society, Vol. 192, 189?, 
A. 229, pp. 117-128, but with modifications in notation and 
treatment.] 

In a universe containing N things p x N are at x x , p 2 N at x 2 . . . 
Pi + Pi + • • • ** 1 > F = a \P\ + a %Pt + • • • where a v a t . . . are 
constants. ^ 

In a selection of n things, n x are found at x x , w, at x % . . . , 
n x + n t + • • • 3=3 *• 

Write F +/= a x 1 + a 2 *+•«« 
n n 




where b x = a x — F etc. 

Then S btpt = Satpt — F.S^t = F — F — 0 

S bt 2 pt = S at 2 Pt — 2FS atPt + F a . S pi = S at 2 Pt — F*. 

Required to find M#, = mean /*, and to show that its relation to 
M a , =» mean /*, is that found in the normal curve of error. 

The expression 

E = {tit 1 * + pj* * + • • •)". 


expanded by the multinomial theorem, gives the sum of tenns 


n ! 


n x 1 n t I . , 




subject to the condition n x + n t + • . • = n 

= sum of terms P . e f * 9 


where /== + 6,^* + • • •, 

n n 

“ d p -»-TO7T. 

and fa the whole chance that the selection n, at x v rt t at x t . . . 
should be made, as may be seen by expanding the multinomial 

(Pi + Pi + • • 0" 
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E — sumof p(i + a/'+ + 

• j * 

} s=s M 0 + aMj -f* — M f + • • • + M# + . . . 

Z 5 1 

AlsoE_(SA + iS^ + 2 4 ,SJ,V. + ^C, + 5 5 i i C ( + ...)’ 

h- 

from the expansion of the terms t * . . . , where 

C* = Sbfpt, C 4 = Sbt*pt . . . , and S pt — i, S b t pt =» o. 

E- (i + ^SB^i + 

Equating the first three coefficients in the two expressions for E- 
M 0 = i, M x = o, M, = — S bt*pt. 

ft 


•** 1 + + + + = + + ) • 


c c c 

Now when n is large, and M f , -| , — f , —|... are finite, we have, 

n* n* n t 

if we neglect -~ § 

i + £m, + ... + £m.+ ...=( I+ £m,)’..> 


( 2 1 ) | / \l 

Hence, in this case, Mi = o if s is odd, and M* * ( ^* ) » as 

in the ^normal curve of error. 

Q 

The conditions that 4 etc. are finite are similar to those 
n* 

in the Edgeworthian analysis on pp. 295 seq. t but need con- 
sideration for each case to which the theorem is applied. 

Thus on pp. 419-20 / = x x 2 e x +x % t e % +. . ., where e x is ^ — p v 

F as ^ the second moment of the universe from which the 
selection is made, and b t = x? — 

M t = i S(xt' - rtf* = l (* - m,*) 

C, = S(*«* — ix t )*pt = (/H — + 2 ht) 

. £, | + 2 /*,* 

ffi ' fo .- ft *) 1 ' 

G G* 2 
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Similarly 

S 

n* 



/*» — 4)¥ » + — 3^i 4 . f . 

k-^, 2 ) 2 


Now if the ratios . are finite, where a-* = /ju, w» 

<7* O' 0 <7* 

r Q 

have that ... are finite, as was required. 

Hence if the curve of frequency of the universe satisfies 
these conditions, which correspond in fact to a reasonable 
concentration about the average, with no groups of importance 
beyond a small multiple of <r, the curve of frequency fo*r the 
errors of the second moment (and of the standard deviation) 
are normal. 

A similar but simpler analysis shows the errors of the 
average have normal frequency (f = x 1 e l etc. F = o, b< = x t ). 

In the case of the analysis of the correlation coefficient 
(p. 422) 




M 13 + 


— — 4.^4. Jt 1 

« \M a ^ 4V 1 ' r 4^ 2 Ma. Mix 1 2kpJ 

* VM 2 A. 2p) zt ~^- Vm s + 8A* + ,, T 


Wntmg A = 0-,*, /j.=*= 0.*, we have, if 

«Y 0,' 

C s 

all values of 5 and t, then ^ = M 2 * x finite quantity, and higher 


, are finite for 
<V°Y 


terms can be similarly dealt with as before. 

Hence if the moments and products of the two-dimensional 
frequency distributions satisfy the conditions already described, the 
error curve of the correlation coefficient is normal. 


10. — The Method of Least Squares . 

This is a method that has for a long time been used for 
assigning the values to be taken when there are a number of 
inexact measurements at choice. 

Suppose a quantity s to be related to k unknown constants 
x 2 . . . xu by the equation s u x x x + ***** + . . . + «***, where 
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n v tf % ... are quantities that can be observed; and let *n sets 
Of observations be made giving 

\Z = !«i^| + \*t*% + . . . + i*k*k 
nZ = n «i^i + n« a * 2 + • • • + ***** 

where the *'s and u * s are known. 

If n=*k, the x's can be exactly determined. If n < k, there 
are an infinite number of solutions and the equations are 
indeterminate. 

If n > k the equations are in general inconsistent, and the 
problem is to assign values to x v . . . which minimise the 
inconsistency, which is assumed to be due to imperfect measure- 
ments of the u* s. 

Write d v d %> . . . for the differences between x z, t z . . . and the 
values obtained from true values of x lt x % . . say X t , X a . . . 

Then i u i^i + i u iX 2 + • • • + — x z — d x 

1 w i X 1 + t u 2 K % + . . . + 2 wtX* — % z = d t 


It is assumed that d v d 2 . . . are errors whose chances are 

given by a normal curve P = ~^r 6 The assumption 

is generally based on demonstrations that under certain 
hypotheses as to the nature of accidental errors this normal 
form is obtained. Whatever may be the validity of these 
hypotheses in physical or geodetical measurements, it is not 
safe to assume that they apply to statistical or biometric 
measurements, whether of deviations from an average or 
errors due to sampling. 

The solution is obtained by finding those values of X lt X 2 . . . 
which make the probability that d v d 2 . . . would occur together 
a maximum, that is which make the sum of d x 2 + d 2 * + . . . a 
minimum. Write f(d v d t . . .) for this sum. 


The conditions for a minimum are 
These give 


sx 1 ° dx 2 4 * * 




454 ELEMENTS OF STATISTICS 

which may be written 

X! . S V + X, . SUjtt, + . . . 

X x s U X U % + Xj . Stt, 2 + . . . sa S UjZ 


XjS u^uu + X,S u t Uk + . . . = S UkX, 
k equations giving the k quantities X lt X, . . . 

This method is found in practice to give quite generally 
good empirical values of the unknown quantities. It is used 
above on pp. 239, 240 in its simple form, and a corresponding 
method where the validity can be tested is used on pp. 364-5. 

(•See Merriman, Method of Least Squares, and Weld, Theory 
of Errors and Least Squares.) 

II . — Simpler Method of Obtaining Formula (130), p. 429. 

Beginning at the top of p. 428 we may write by the Multi- 
nomial Theorem that the joint chance of finding the errors 
e^ •••£#... e% is 

P== Nl 

+ \ ... (m, + e,) \ .. . (m„ + «„)!' 

, W 1 e i m l + W, e n 

Pi • • • Pt • • * Pn 

Suppose the greatest of the terms m t ~ x to be negligible, and 
apply Stirling's formula to the factorials. After reduction we have 

*L=lL i i 

where K" 1 = (2ir) 2 (m 1 . . . m, . . . m n ) N " . 

Then log (P h- K) = — 2 (m, + e, + \) log (1 -f 

--2 + + + 

■* — Se t — £ 2— ± terms of order nu 
since e% is of order w**. 

If terms of order are negligible, then since 2 ,e t = o, 

P = K* ' "h. 


as in formula (130). 





SUPPLEMENTS. 


Supplement I. KbRTosis as Measured by k 2 . Illustra- 
# tions. (See p. 252.) 

k 2 cannot be less than 1, since n (a A + £> 4 + • . .to n 
terms) is greater than (a 2 + b 2 + . . . to tt terms) 2 , unless 
a = b = . . • when it equals 1. There is no upper limit 
to it. 

Eight diagrams (A) are drawn, so that all their areas are 
equal and all their standard deviations equal, each diagram 
being symmetrical, which yield ascending values of k 2 from i*i 
to 9-0. 

In diagrams 1 and 8, out of m + n observations m are at 
zero and \n at unit distance to left and right. In such a case 
k 2 = 1 + m/n. When m = o, k 2 = 1 ; as tn increases, k 2 
increases without limit (n decreasing, since m + n is kept 
constant). In 1, m = T V, n — k 2 = i*i. In 8, m = f, 
n — f<2 ~ 9* 

In diagram 2 the distribution is graded upwards from 
zero at the centre to the extremes; k 2 = 1$. 

In diagram 3 the observations are uniformly distributed 
and k 2 = i*8. 

Diagram 4 is a half ellipse, where the observations are more 
numerous at the centre than at the extremes, k 2 = 2. 

In diagram 5 the number of observations increases uni- 
formly from the extremes to the centre. k 2 == 2*4. 

Diagram 6 shows the normal curve of error, with k 2 = 3. 

In diagram 7 the distribution is two elliptical quadrants 
placed so as to touch at the central vertical. k 2 = 3*22. 
(Exactly 2(992 - 3x5 n) (4 - tt) 15(16 - 5 ") 2 )- 

All these results can be verified by direct integration. 


Supplement II. Correction for Crude Value of the 
Mean Derivation. 

(See pp. hi. 253, 439.) 

When the observations are graded, as in the example on 
p. 253, there is some difficulty in computing the mean devia- 
tion. The trude method is to add the deviations measured 
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from the centre of the grade in which the average or ihedian ' 
is situated, disregarding their signs, and to divide by the 
punjber of observations. Thus we have (4906 -f 4032) 4- 3404 

• = 2-6257 (unit 5 lbs.). 

• This process requires three corrections, one of which is 
.analogous to Sheppard’s, of which the result is valid to the 
same extent and under the same conditions as apply to the 
use of his formulae (pp. 439 seq.). The second is to allow for 
the contribution of the central grade, the third to the distance 
from the centre of that grade to the point from which deviations 
arel measured. 


E F 



Diagram B. 


1. In any grade, ST, breadth h, the number of observations 
is (proportional to) SRPT — h x y„ where y, is the height SR. 

In the crude method of computing the mean deviation 
these observations are taken as concentrated at N, the middle 
point of ST, and their contribution to the sum of the deviations 
about O (mid-point of the central grade KL) is hyjc„ where 
ON = 

If, in fact, the observations are distributed in a curve 
y = f(x), as indicated by UV, the true contribution of this 
group may be obtained as follows. 

Take N as a temporary origin, write NQ = x, where Q is 
any point between S and T. 




45$, ELEMENTS OF STATISTICS 

The sum of the deviations arising from the grade ST is 

/* (*. + z)dz= [*„ {/ (x,) -f */'(*.) +...} (*, + z)dz$ 

= hx,/(x,) + * -fix.) + • . . 

= hx,y, + , . hf'(x,) + ... 

= hx,y, — A 2 UW, approx. 

where VW is horizontal, UV is taken as a straight line 
(neglecting J"(x,) ), and therefore f'(x,) = — UW/A. To this 
order of approximation UV bisects RP at H. • 

The contribution of each grade is therefore over-estimated 
by such an amount as A*UW/i2. The same result is reached 
in the left of the figure. 

Take h as the same for all grades. 

If we suppose that the height to right and left diminishes, 
so that such lines as UV reach the horizontal axis with sufficient 
approximation, and that the net adjustment due to curvature 
in the central grade is negligible, then the sum of UV’s on 
each side is approximately FL or EK = N Jh, where N 0 is the 
number of observations in the central grade. 

We have therefore to subtract 

A* N 

2X ii x T° = ^ N » 0) 

from, the sum of the deviations crudely measured. 

(2) The sum of the deviations in the central grade about O, 
ignoring curvature, is 

2X£N 0 X \h = £AN 0 .... (ii), 

which is to be added to the crude sum. (i) and (ii) together 
lead to addition of tV*N 0 . 

(3) Now measure the deviations from M between K and L, 
where OM = d. Let N x and N_ t be the number of observations 
to the right of L and left of K respectively. 

We have to add N-jrf and subtract for the outside 
grades. 

The whole contribution of the central grade is now 

NJA{g + <*) ■»(* + <*) + (i - ' ‘(i- ')} = N "(j + O' 

instead of JAN 0 as in (ii). 
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In all we have to add _ * — N x -f ^ N 0 ^ d . (iii) 

• 

Assemble these results, writing N for the whole number of 
observations, rj 0 for the crude and 77 for the corrected mean 
1 deviation. 

N »7 = N* + *AN 0 + (N. x - N, + l N 0 )i 
This is a general result. 

Write for the mean deviation from the median. 

‘.Here $N = N -i + (5 + «) N 0 //» = N i + (* - *)nA 

where w is the distance from O to the median. 

/. N_i - N, = - itnNJh. 

Write N 0 = IN, so that l is the proportion of the observa- 
tions that fall in the central grade. 

We have r, m = Vo + l(^~ - 

For any other point in the same central grade 

V ~ Vo + + j) — 2 dmljh 

= Vm + l{d — rnflh (iv) 

In the case d — x, the distance the average is to the right of O, 
this applies to deviations from the average, so that 

Va ~ Vm -}- nty/h .... (v) 

In the example on p. 253, the median is approximately 
102$ — tW (of 5 lbs.) h = 1, if we take 5 lbs. as the urit. 
m = - T W = - -099. 

r) 0 = 8938 4- 3404 = 2-6257, l = 404 4- 3404 = -119 

Vn = 2-6257 + - j~i)= 2-6344 

x = -2568 

Va — rim +-ii9(-2568 —099)®= 2-6344 4-0030 = 2-6374 
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Supplement III. Mean Deviation and Mean Difference.* 
Lorenz’ and Pareto’s Curves. 

(See pp. 1 14, 346.) 

Let y — f(x) be the equation of a continuous frequency ' 
curve, ranging from x — h to x — k. 

Write Y = F (x) = f f(x)dx, i.e. the number of observations 

J X 

above x. Then N, the whole number of observations = F(A). 



Diagram C. 


/ * 

xf(x)dx, i.e . the sum of the values of 

X 

x from * to k. Then, if x is the average of the curve, the total 
of all the values of x = N* = 3 >(A). 


. dY dF , 

We have -j- = -r - = -f(x) = -y 


dZ^d® 
dx dx 


- */(*) = - xy 



(i) 


* For a general discussion on the use of these quantities and most of the 
formulae in this note, see Bulletin de 1 ‘Institut International de Statistique, 
Tome XXV, Livraison, 1931, pp. 189-320. t % 
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Write ij« for the mean deviation of the curve about **== u. • 

Ni), — f (x — it)ydx + I (u — x)ydx 

• • Ju Jk 

= «(«) - «F(w) + m(N - F(m) ) - (N* - <t>(«) ) 

’ * 2 {<D(m) - «F(m)} + N(« - x). 

• This is a minimum for the value of w which satisfies 

2 — 2 «F'(«) — 2 F(m) -f N = o, 

i.e. when — 2 «/(«) -f 2 «/(«) — 2 F(m) -f- N = o, 

N 

i.e., when F(u) = and u is the median, say m. 

Then N*)„ = 2 <P(m) — Nx — 2 {<t >(m) — xF (m)} . . . (ii) 

If u — x, Nt^ = 2i{<I>(*) — xF (*)} (iii) 

We have »?; > i?„, since the latter is the minimum value. 
Mean Difference. The number of differences is $N(N — i), 
which may be taken as JN* for purposes of integration. 

Write g for the mean difference of the observations. 

i N *e = j[7(*){/7(«) (« - x ) dujdx, 

for the difference u — x taken positively is to be multiplied by 
f(u) and /(*), the number of observations at » and at x. We 
first keep f(x) constant, and obtain the sum of the differences 
given by all ordinates whose abscissa is greater than x, and 
then integrate for x between extreme values. 

JN*g = l M k /(x)mx) - xF(x)}dx 

- /~‘ (ZdY - YdZ) = 2 :f t k ZdY - [ZYJ* 

(integrating by parts) 

= 2 { N ZdY - NxN, 

Jo 

since Z = Ni, Y = N, when x = h, and Z = o = Y when x=k. 

Draw a graph of Z as a function of Y, KSVB , KH — AB = 

N, KA = HB == Nx. 

When x = h, Z = HB, Y — KH ; as x increases to k, 
Z and Y diminish to zero at K. 

Then *gN 2 = 2 Area KVBH - KABH 
g _ 4 KVB 
x~KAm > 

whete KVB is the curvilinear area bounded by KVB, KB. 
KM, KU ire the values of Y at the median and average. 
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Verticals through M and U meet KB at T and P, and the 
curve at V and S. 

Since HB = KH x 2, 

MT — KM Xx — xF(m) 

UP = KU X X = *F(5) 
also US = <b(x),MV = G>(m) 

.*. Nij m ~ 2 (M V — MT) — 2 VT, from equation (ii), 

and where KVB is rectilinear. 

Similarly from equation (iii) 

“ RABlI ’ w ^ ere KSB is rectilinear. 

dZ 

At S, jy = x, from equation (i), and therefore the tangent 

at S is parallel to KB, and PS is the maximum ordjpate of the 
figure KVBPK. 

Without loss of generality we can choose our scales so that 
N=i,* = i. 

Then g, ip, rj m are identified with four times the area KSB 
(curvilinear), KSB (rectilinear) and KVB, which are (except in 
extreme cases of equality) in descending order of magnitude. 

If, for example, x stands for income, any position, such as 
V, shows the proportion of aggregate income (VM -j- KA), 
that accrues to the proportion (KM -h- KH) of all holders 
of income over h. 

If all incomes tended to be equal, the curve KVB would 
approximate to the line KB. 

KSVB £ 

Write A = Then A is the Lorenz measure- 

ment of inequality of distribution, or (as it is sometimes termed) 
of concentration of income among the richer. With A = o all 
incomes are equal ; as A approaches its maximum, }, a greater 
and greater proportion of income tends to be in the hands 
of the richest. 

Application to Pareto's Curve . 

Take k as infinite. 

Y, the number of incomes above x(£) — -j, where o > I, 

* 

and A and a are constants. 
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N - A t A a , and Y N^-j . 

fj =»J [ —x.^.dx =NA*£ ( + aLX~*)dx =NA* . a . * 1_€ -f- (a— i). 
Write A for *, and we have N* = NAa -j- (a — i). 

-Mr 

Write Yj = Y -r N, the relative number of incomes above x, 
and Zj = Z -f- Ni, the relative amount of income above x. 

Th “ Y * - (*)'■ z ' = 
and Z l = Y* ~l = Y,i, if j- + J * i • 


When a approaches I, 8 increases indefinitely and income 
is “ concentrated ” in few hands; at the same time inequality 
of incomes diminishes. 

With this equation 

JN *g = 2 j^N t xZ 1 . dY, - N*i = NV 


2a ~ I 


,8—1 


N>li = 2(Zj — *Yj) = 2Ni{(t) # t -©•} 


— T^-l 


Tt = *• 


The median is given by Y = JN, m = A x 2“. 

= |(Z. - iY B ) = * (2* - 1). 

In the case where a = i*5,f we find 

^ = 3A, Y t — Zf, g = 3A, rji = = 2'3iA, ij« = i*76A, 8 *= 3, 

A = J. 

• * The auantity 8 in this connection is used by Professor Gini. 
t The diagram illustrates this case. 
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Application to NortAal Curve, 

. a r 

Write y = /(*) = , A is oo , A is — oo , anct 

f{k) =0 —f(h). 

Then — . jy, and since ^ — —y, we have 

Now i N*f = £(ZiY - YdZ) — N 4 * — 2 J*YrfZ 
= N 4 i - 2 j* xYdY 
= N** - 2* jf*Y<*Y - 2 J*(* - x)YdY 
= 2 j\x-x)YdY*, since 2 j*YdY = [Y 4 ]* =N* 

• So far, true for any curve. 

= 2 a 4 ^ Yrfy = [ 2 a^yY]^— 2 a*jf ( — y 4 )^* 
it jj* 

= 0 + 2 a* I 1 e ®* 

Jk 2 TO* 


£ = 


oN 4 

\/lT 

2 a 

■\/it 


We already know that the mean deviation = 

(p. 269 (24)). This is consistent with equation (iii) p. 461, for 
x — m, and N -q m = Nijj 

- »w*> ) = 2 jf (*- 


N 




I - T.i 


■%. T 


7 \ f 2 » 
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Supplement IV. — Time Series. 

(See»pp. 132, 137, 148, 337 seq.) 

• 3 

• Suppose that we have a series of annual records which we 
have reason to think depend on a non-periodic trend, affected 
by sporadic variation that is independent of the date. 

We may write for any observed value Y = /(T) + v, where 
/( T) expresses the trend and v is the residual, T measuring the 
number of years from any zero time. 

Take the particular case where /(T) is expansible in a 
rapidly converging series and write 

Y = a + bT + cT 2 + dT* + . . . + v. 

Transform this so that time (t) is measured from the centre 
of the period under consideration, 

Y = a + bt + ct 2 + dfl + . . . + v. 

Write T, for mean t\ 

Then when s is odd, T, = o. Le. T 1 — T 3 — T 6 — , . . = o. 
It can be shown, or verified, that, when n is the number of 
years considered, 

T 2 = (n 2 - i)/i2, r 4 = (n 2 - i)(3 n 2 .- 7)/240, 

T 6 = T 2 . (3n 4 - 1 8n 2 + 3 i)/xi2, 
T 4 - T 2 2 = (n 2 - i)(n 2 - 4 )/i8o, 

T 2 T 6 - T 4 2 = (n 2 - i) 2 (n 2 - 4 )(n 2 - 9)/33><*>o. 
These formulae are true whether n is odd or even. 

Compute a , b, c, d ... by the hypotheses that o = mean 
v — mean vt = mean vt 2 — . . . , thus expressing arbitrarily the 
mutual independence of v and t. 

Write y, m lf m 2 , m 3 ... for mean Y, yt, yt\ yt\ . ., where 
y = Y-?. 

Consider only powers up to t 3 . 

We have 

Y = a + bt -f ct 2 + dfi -f- v. 

Add the n expressions of this kind and take the mean. 

y = a + o + cT 2 -f 0 -)- o, since mean v = o 
y — Y — y = bt -f c(t 2 — T 2 ) -f dt 3 -f- v . 

Multiply the last equation by t , and take the mean. 

m i ^ + c(T 3 — T 2 . T x ) 4* rfT 4 + o 

* = £T 2 + dl\, since mean vt = o. 


H H 
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Similarly multiply the same equation by fl, and take the 
mean. 

m t = c(T 4 — Tj a ), since meah vt* = o. 

and again 

m t — 6T 4 + dT t , since mean vfi = o. 

Hence 

WjT 6 


Y =y — 


m 2 T 2 


T 4 - T a * + T 2 T 6 -T 4 2 




m. 


7^2 ‘ 
a 2 


+ ”* 3 T 2 


■WjT 4 


T 'J' 'T' J ^ 

2 6 4 

The values of y. w 2 . . . are to be computed from, the 
observations. Those of T 2 , T t . . . are given above in terms 
of n. 


When d is zero, the equation becomes that of the parabola 

i8om 2 

m 2 — 4 " r n 2 — I ' * (n 2 — i) (n 2 — 4) ' 

When c is also zero, m 2 is zero, and the equation is linear, 

12m, 




.t 2 + V. 


y + 


n* 




In this last case y = . / + v. 

Multiply by y and take the mean ! 

® I2W, . 

<7 2 __ — # mean vy. 

y n 2 — 1 1 ^ 

Multiply by v and take the mean 

Mean vy = o + a v 2 


a 2 = a 2 — 


I2W X 2 


= a, 2 - 6 2 T 2 = a, 2 - 


m* 

V 


This expression indicates the reduction of the deviations 
when the observations are measured from the chosen linear 
trend. There is no improvement, if o = tn x = mean yt. At 
the other extreme a v = o, if y = bt throughout the period, for 
t a T 2 . 

In the case of a parabola, we find similarly 


then o y 2 


a 2 - a, 2 - b 2 T 2 - c 2 (T 4 - T 2 2 ) = a 2 


tn 


i__ 




* It is easily seen that the coefficient of t is r*a,/a„ as in the usual fegres- , 
sion equation. 
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For the cubic. 


”* 2 » _ (T 2 ttt s — T 4 w t ) a 


-a, T^TT? T 2 (T 2 T, - T 4 *) 

A measure of the significance of b and c (in the case of a 
parabola) may be determined thus : — 

' If there were no trend and the y ' s were distributed at 
random about zero, the value of b — Eyf/«T 2 would be 
fortuitous. 

Then b is a weighted sum of the y’s, the weight of y, being 
Tj- 

, aJlit 2 or 2 12 „ , 

17 “ nr 2 ~ »(n» - 1) • a » (p - 3l6) - 


Similarly, c = Sy/ s /n(T 4 — T 2 S ), 


w *(T 4 ■ 


«T 4 


-T 2 *)» 


a i35(3w*-7) 

' - n(n 2 - 1) (n 2 - 4)** 


[See Journal of the R.S.S., 1886, pp. 469-475, Edgeworth, and 
1926, pp. 307, Bowley.J 


Correlation of Time Series . 

[ Journal of the R.S.S., 1926, pp. 300 seq.] 

With the notation already used (p. 465), take two series 
x — X — x = b x t 4 * c x (t 2 — T 2 ) + « 
y = Y — y = V + c 2^ a — T a) + 
where * 5 ^ 6 2 , c lf c 2 are determined as before by the conditions 
o = mean u = mean v = mean = mean vt = mean = 
mean vf a . 

Multiply these equations and equate the means of the left- 
and right-hand products, remembering that 0 = T x = T 8 . 

Mean xy = b x bfl z + c 1 c 2 (T 4 — T 2 a ) -f mean uv. 

rxy . < 7 X . <Jy = ^ 2 2 ) "f* 

Also by squaring each equation we have 

** 2 = b 2 T 2 + ^i 2 (T 4 - T 2 2 ) + a,* 
o 2 = 6 2 2 T 2 + c 2 { T 4 - T 2 2 ) + o 2 . 

The equation for r ^ shows the contributions to a crude 
correlation coefficient, between two variables in time, made by 
the frend constants and by the residuals. 



ELEMENTS OF STATISTICS 


(468 
« 

This can be better visualised in the case of linear trends, 
where c x — c % — 0. 

ft 

Write l 1 =b 1 x - 4- cr« = the increase due to trend ill hi|f 

the range divided by the standard deviation of residuals front 

the trend. Similarly write l 2 = b 2 x - <r v . 

‘ r 2 

Then txy — {$^1/2^ ^2^ 4 ~ • A 7 ** 7 ** 

since T 2 = (n 2 — i)/i2. 

When w is great, this tends to 

(^ 1^2 4* O/vW + I )(^2 2 4“ *)}• 

Thus when the trend-gradients are insignificant, r ^ is 
dominated by the correlation* of the residuals, but when the 
gradients are considerable they outweigh the effect of the 
residuals. 

Notice that if b 2 = o, == r u1 p u /a x , and the trend of the 
x line does not affect the correlation. 


Supplement V. — The Logistic Cujrve. 


(Note to Chapter V, p. 343 seq.) 

Let P measure population at any time t . 

Then, if P increased in continual geometric progression, 

p . would be constant, and P would become infinitely great 

as t increased without limit. 

Empirical formulae can be suggested to damp down the 
increase ; that best known is the Logistic Curve, whose equation 


is 


1 dV __ 1, 
P ‘ 


-)• 


Here a and L are constants. Growth continues till P = L, 
which is the limit of population, a measures the time-scale. 
The integral form is readily obtained as 

P = L 4- (1 +e~) 


where b is a constant. 
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Take the arbitrary time, zero at b t writing r = / — b. 

_ T 

TJien. *P = L 4 - (i + c 


and 


L L e ** — e s * 


e la + e * 


— — = ^ tan h — , 

2 2fl 


46$ 


which is skew-symmetrical in r about the value P 


The rate of growth is a maximum when 
d 2 P , , 

A method * of evaluating L, a and b from the observations 


is to write P = q. 


> -1 


Then QL — 1 = e “ . 

Take three values, Q-j, Q 0 , at equal time in intervals, 
viz. : t = h, o, + h, where zero time is taken as b the middle 
observation ; h may be a Census interval or a multiple thereof. 

b Hh h 6 b — h 

ThenQ-iL —i=e a , Q 0 L - 1 = e a , Q t L - 1 = e 0 . 


Then l - log, (Qo-Q -0 


i°g, (Qi - Qo) 


5 = log. (Qo-Q-O+log, (Qi-Qo)-lo& (Q1Q-1--Q0 2 ) 

L(QiQ-i Qo 2 ) = Qi H - Q-i — 2 Qo> 
as can easily be verified Hence a, b and L can be found 

A variant + of the equation is obtained by writing 

I • w “sV( x “£)• 


The integral of this may be written 

P = 4k_ = L sech i Lzi. 

t=? 2a 

e ‘ + « * + 2 

♦ See ft.S.S. Journal , 1925, Yule, pp. 49, 50. 

t This was suggested to me by Dr. Rhodes. It is a special form of an 
•equatiqp given by Dr. L. Hamburger. Chemisch Weekblad, Dec. 30, No. 5 
(1933), Amsterdam. " Investigation on Complete Growth Functions/' p. 121. 
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This curve is symmetrical about its maximum, P = L, 
when t — b. 

As t increases positively or negatively, P tends asymptotic- 
ally to zero. The population thus rises to a maximum and 
then falls towards zero. » 

An infinite number of other forms is possible, and the choice 
between them is arbitrary. 


Supplement VI. — Transformations of the Normal Curve. 

(See p. 346.) 

Suppose 2 to be a variable normally distributed with fre- 
quency 



Let x be related to z by the equation x — f(z), and consider 
the frequency distribution of x, say y = F(x). 

To every element -qdz at z, there will correspond an element 
ydx at x. To z — o, corresponds the median,* but not in 
general the average or mode, of F(x). 

The relationship between the curves of 17 and y will be as 
indicated in the diagram D annexed, where f(z) — l(z -f 5)* is 
taken as an example. The rectangular area at N, where z = 1, 
becomes the broader and lower rectangle at N', where x — 7*2. 

Simple Case of Translation. 

In Edgeworth's method of Translation f(z) is taken in 
the form x = a + bz -f- cz 2 -f dz s , which is suitable when the 
variations of x depend on the variations of the cube of 2. The 
full working out, with a suitable Table, is to be found in F. Y. 
Edgeworth's Contributions to Mathematical Statistics, issued as 
a separate pamphlet by the Royal Statistical Society in 1928. 
A simple case will illustrate the method. 

Let x = m -f a(z + bz 2 ), where m is the median value of x. 

This equation can be fitted to observation either by moments 
or by three percentiles. 

* When formulae in which f(i) > o are excluded. 



ELEMENTS OF STATISTICS 


47* 


Method of Moments. 

Measured from t*he median M, = I {a(z + bz^Yrjdz. 

# J —CD 

M 0 = i, M t = ab, M„ = a 2 { i + 3 b 2 ), M a = 3<z*b(3 + 56*), as 
may be found by direct integration. 
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\Vfe readily find that 

, _ 2 B >(3 + 2Z0®)® , 

(i -)- W) 2 ’ 

and therefore w is a root of 

t 0 3 ( 8 — K 2 ) + 3^ 2 (8 — K 2 ) + 310(6 — K 2 ) — K 2 = o 

A first approximation gives 1810 = k 2 , which leads to 
a^a/y/l + 

when a* = /z 2 . 

The full solution gives b, then the value of fi 2 gives a, and 
the median is — ab from the average of the observations, x t 

so that x = x — ab + a(z + bz 2 ). 

Thus in the example on p. 305, k = *4093, w = -00943, 

b = -0687, a = 9-4155, a ==a~ V I,00 943 = 9’37 2 » m = 5*’453 
— ab = 50-810, and 

x = 50-810 + 9-372 (z + -068 jz 2 ) . . . (i) 


Method of percentiles. 

Let z v z 2 , z 3 correspond to x v x 2 , x 3 . 

The known proportion (p x ) of these observations less than x x 
is ’the same as the proportion below z x ; 


Hence 


-i + Pi = J o 


I 

-7== e 

l/27T 




dz , and z can be written 


down from the normal table. 

We then have x x = m -f a(z l + bz x 2 ) and two similar 
equations to determine m, a and b. b is found from 

*2 - *1 = (Z 2 - 0 t )(i + 6 (z 2 + zj) 

*8 “ *2 (*3 ~ ^)(I + b ih + h)) ’ 

and thence a and m. 

Select three values from the Table on p. 305. 


*1 — 31 - 5 . P* = °° 8 . — £ + Pi = — '49 2 > *1 = — 2-410 

*2 = 5 i- 5 . £2 = - 525 . -£ + />i= -025, z 2 = -063 

*s = 86-5, *= -999, - \ + p x = -499, 2, = 3.10 

The solution of the equations gives 

* = 50-85 -f 9-53(2 + -06532®) . . . (ii) 

Obtained by adding the " observations " column to the given valued x. 
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The observations are compared with the curve by finding 
the values of z that correspond to the given values of x , and 
§o tjie number that should fall in each grade. Or we can write 
•down the values of z that correspond to successive groups of 
» occupations and find how nearly the resulting x’s agree with 
the limits of the grades in the data. 

The method is only applicable when k is not great, say not 
greater than unity (see R.S.S. Journal , 1898, pp. 695-7). 

When k is small, so that k 2 is negligible, the distribution 
tends to that given on p. 302. 

•Within these limits the method is useful. 

The Law of Proportional Effect . 

j + b 

Take the transforming equation as x = x 0 -f e a , 
so that z = a log,, (x — x Q ) ♦— b. 

dtx x ) 

Then dz = a • — — , so that a small absolute variation 

* — * 0 

in z is proportional to a small relative variation in ( x — x Q ). 

The frequency of x is given by 

ydx = T)dz 

y = — - — . ~ e -I'" i°k- <*-*.>-«'>• _ 

J X — X 0 y2n ' 

The constants x 0 , a , and b are to be determined from the 
observations. 

Diagram D. I will serve to indicate the relative position of 
the points, though it is drawn from a “ translation ” equation. 

O is the origin from which the x observations are measured. 
M, D and A are respectively the median, mode and average of 
the curve y — F(x). OB = x 0 . 

The median of the new curve corresponds to z = 0. 

? h 

Hence OM = x 0 + t a , and log, BM = 

The mode is obtained from ^ log y = 0, which leads to 


b 1 
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"rtife area remains unity and the average is given by 

.00 .00 *±* c 

x — OA = / xydx = I (x 0 + e * )ridz 
J — 00 

/ l\i b l b , 

r I -n-i) ..i+S-o. . a+i 




^2 = Xq-\~ 6 L 


log BA = log (OA - x o) = h - a + ^i- 


Hence log BD + 2 log BA = 3 log BM, and log ; 
BD 

J log which is analogous to equation (145), p. 444.* 


To find the constants we can again proceed either by the 
use of moments or of percentiles, or graphically. 

Method of Moments. The average has already been found. 



The integration is fairly simple and leads to 



These equations are sufficient to determine a, b and x p 
We apply them to the example on p. 305. 

b X 

Write v = e a ,w = e*. 

Then 

(w 4- 2 )(w — 1)* = k — -4093. w = 1-0184, a — 7-406 
vw*(w — 1 )» == a = 9-4155. 

* In the translation method we have the result MA — |DA, when r* and 
the coefficient of ** are neglected. 

f This is equivalent to an equation given by Wicksell, Genetic Theory oj 
Frequency, 1917, p. 14, equation (22). o 
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v = 6878, b = (i log, v = 31-33. 
x 0 + vw* = x + 51-453. x 0 — — 17-89. 
z = 7-40^ log, -f 17-9) — 31-33 
= 17-05 log 10 (x + 17-9) - 31-33 . . . (iii) 

Method of Percentiles. For ease of solution we require to 
know the median and two percentiles at equal proportions 
from it. In general we only know one of these except by 
approximation. 

Let p x be the proportion of the observations between the 
median and either chosen percentile, and x v m , x 2 be the 
s£ale readings for these percentiles and the median. 

Determine z = h from the Table on p. 271 so that 


1: 


dz = p x 


Jo 'sjzij 

b-h b b+k 

Then x x = x 0 + e a , m = x 0 + e a , x 2 = x 0 + e a . 


In the same example (p. 305), the median by interpolation 
is 50*945 = m - 

Take x x — 41-5, whence p x = *357, and h = 1-067. 

For x 2t we must include -857 of the observations. We 
obtain by interpolation x 2 = 61*34. 

b h 

Write v = e a , u — e a 


41*5 = x x = * 0 + v/u, 50*945 = nt = x 0 + v t 
61 *34 = x 2 + vu. 

m 


X 2 

u — — 

m — x ] 

1 — 1 

v ~~~ 


= i*io, and a = 11*165. 


m — x. 


J : — a nd v = 103*4. 
— m 0 ^ 


b = a log* v = 51*77. 


*0 = 50*945 - v = - 52*45. 

2; =» 11*165 log* (x - x 0 ) - 51*77 
= 25*7 log 10 (x + 52*5) - 51-8 . . . (iv) 


It is noticeable that this equation differs in its constants 
from that by the method of moments. But, as seen in the 
diagram on p. 471, the middle and right-hand lines both give a 
plausible fit to the observations. It is remarkable what little 
cfiange in the result is made by great changes in the constants. 
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M. Gibfat’s Graphic Method,.* 

From the observations we can find, by the method just 
used, the values of z that correspond to the given values of x.f 

We have therefore a number of equations of the form 
z — a log, ( x — x 0 ) — b, where z and x are known, and x 0 , a, 
and b are unknown. 

Take an arbitrary value of x Ql write X = log, (x — x 0 ), so 
that z = aX — b. Plot z, X on squared paper from the observa- 


Distribution of iooo Sums of the Number of Letters 

in io Words. 


I 

2 

3 

4 

5 

log 

6 

log 

7 

t 

F«. 

X . 

log *. 

* -f 17-9 * 4- 52*5 

Data. 

8 

3 X *5 

•492 

- 2-41 

1-30 

1*69 

1-92 

38 

97 

36-5 

*454 

- i*68 

1-56 

1*74 

i*95 

4 X *5 

•357 

- 1*07 

1-62 

i-77 

1-97 

155 

46*5 

•202 

- o-53 

1*67 

i*8i 

1-996 

227 

5i*5 

023 

4- 0*06 

x*7i 

1-84 

2-017 

202 

56*5 

•327 

4- o*6i 

i’75 

i-87 

2037 

134 

6i-3 

•361 

4- 1-09 

i-79 

1*90 

2-057 

76 

66*5 

*437 

+ i-53 

1*82 

i*93 

2-075 

37 

71-5 

•474 

+ i*94 

1-85 

x-95 

2-093 

*3 

76*5 

•4»7 

4- 2*23 

i-88 

1-975 

2-106 

8i*5 

•496 

4- 2-65 

i*9* 

1-997 

2-137 


86*5 

*499 

4* 3*09 

i-94 

2*0IQ 

2-143 

3 


Proportional 

Effect. 

iii. iv. v. 

5 7 10 

Trans- 

lation. 

i. 

7 

Normal 

Second 

Appro*. 

8 

34 

35 

33 

4i 

37 

103 

98 

93 

95 

99 

184 

172 

172 

170 

173 

222 

212 

204 

216 

213 

188 

I98 

193 

191 

191 

126 

133 

150 

136 

*35 

75 

77 

84 

80 

78 

36 

4 X 

38 

38 

40 

16 

i7 

x6 

17 

x8 

7 

7 

5 

6 

7 

3 

2 

1 

2 

2 

1 

1 

i 

X 

0 

13 

5* 

14 

6 

, H 


Col. i Is the original scale. Col. 2 Is computed from the Data, col, 7, and shows the proportion of 
observations from the centre to the value of x Col. 3 is from the Table of the Normal Curve Cols. 
8 to ix are the results of computing x from equations (iii), (iv), fv) and (i) respectively, reading the 
corresponding F(r) from the normal Table and writing down the differences. Col. 12 is from p. 305. 


tions. If the points indicate concavity (or convexity) to the 
axis of Z, decrease (or increase) x 0 and re-plot the points. 
After one or two experiments a straight line will be found 
approximately, if in fact the observations can be represented 

♦ R. Gibrat, Les InigaiiUs Izconomtques, Recueil Sirey, Paris, 1931. 
M. Gibrat introduces the term " La loi de reflet proportionel, and gives many 
illustrations of its use. The equation itself has been known for a long time. 
(Cf. Wicksell, loc . cit.). 

t Thus in the Table annexed eight observations (Col. 7) are below 31*5 
(Col. 1); therefore *=31 5 marks ( £ -f 492) of the 1000 observations, or 492 from 
the median (Col. 2). (8 + 38) observations are below 36 5, which marks *454 
from the median. Col. 3 is then obtained from the Table on p. 271. with dae 
regard to sign. 
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by this form. From the. graph, or by computation frbm twcf 
selected points, a and b can then be found; 

. ^Diagram D. II shows three lines on this basis. That on the 
left is 

z = ii* 62 log 10 * - 19-83 .... (v), 

the next is from equation (iii), that to the right is from equa- 
tion (iv). The observations are shown by the dots, (v) 
indicates slight convexity to the axis of Z, (iv) slight convexity, 
(iii) is neutral. 

# The results, together with that by the moment method of 
translation are given in the Table, where also the last column 
of the Table on p. 305 (the result of the second approximation 
to the normal curve) is repeated. 

Roughly computed values of x 2 (P- 43 1 ) indicate that any 
of the equations give a plausible fit ; (iii) and (iv) are the least 
satisfactory; (iv) is the best, closely followed by (i), but 
P < *5 also for the last column. 


Supplement VII. — Correlation of Ranks. 

(Note to pp. 368-9.) 

Let n persons be arranged in order as regards each of two 
attributes, so that the t th person is of rank x t for the one and y : 
for the other. Then the values of x t are 1, 2 . . . n. as are 
the values of y t , but in general x t is not the same as y t . 

Write x and y for the averages and a x , a v for the standard 
deviations. 



Write D a for the mean square difference between x t and y t . 

d* = n f ~ yd * = — *) — (y> - y)) a 

= ff, 2 + 0* — 2Ra x a y = } (n 2 — i) (l— R), 
where R i§ computed as a correlation coefficient. 
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R = i — — - (Spearman’s coefficient) 

is a measure of the difference between the rankings of the same • 
person with respect to the two attributes. 

R = i corresponds to identical orders, where x, = y, for 
jeach person. 

If the orders are reversed so that x, = n + i — y„ 

£(*, —yi) a = Z(2X, — n — i) a 

= 4S ^ 2 — 4(m + i)Ex, -J- n{n + i ) 2 

— w ( w> ~ x ) 

~ 3 

and then R = — i. 

R does not measure the correlation between the attributes, 
which is not obtainable from the data, and the true correlation 
may have any number of values for unchanged R. For 
example, R depends only on the orders in which candidates 
in an examination are placed, and not on the actual marks. 

Professor Karl Pearson has shown ( Drapers' Company 
Research Memoirs, No. IV) that, when there are an indefinitely 
large number of persons with attributes distributed normally 
with coefficient of correlation r, we have 
r = 2 sin (fnR). 

r > R, except when r = R = o or i, but the maximum 
difference between r and R is only -oi8, when r — *6, approx. 


Supplement VIII. — Note on Determinants. Rectilinear 

Regression. 


(Note to Chapter VIII.) 


Since only one property of Determinants is necessary for 
the treatment of linear regression equations, it is worth while 
to obtain it from first principles. 


Write D = | a x b 2 c 3 d x \ — 


#2 ^2 ^2 ^2 
Og b 3 c 3 d 3 
a 3 b 3 q d> 3 
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• $ 

= b 2 c 3 d x b x d 2 c 3 d A + c x a 2 b 3 d 4 — d x a 2 b 3 c 4 
^1 b 2 c x d 3 -f- b x u 2 c 4 d 3 c x d 2 b 4 d 3 -j~ d x a 2 b x c 3 

# ^ d x b x c 2 d 3 b x d x c 2 d 3 "f* c x d 4 b 2 d 3 — d x u x b 2 c 3 

* a x i 4 c 3 d 2 + b x d x c 3 d 2 — c x a x b 3 d 2 + d x a x b 3 c 2 

) + a x b 3 c x d 2 b x d 3 c x d 2 -f- c x d 3 b 4 d 2 d x d 3 b x c 2 

d x b 3 c 2 d x -f- b x d 3 c 2 d x c x d 3 b 2 d 4 + d x a 3 b 2 c x 

Here every possible permutation of i, 2, 3, 4 is used in the 
order of suffixes applied to a, b, c, d. One interchange of 
adjacent suffixes (or letters) is taken as changing the sign from 
+ # to or from — to +• The first term being taken as 
positive, the sign of every other term is determined. 

Collect the coefficients (or " co-factors ”) of a v b v c v d v 
and write them as D n , D 12 , D 13 , D 14 respectively. 

Then D = d x D u -f- b x D 12 -f- c x D 13 + d x D 14 

f Now, if the quantities in two rows are coincident, so that, 
for example, d x — a 2> b x = b 2 , c x = c 2f d x = d 2> the rule of 
signs at once gives D =0. 

Therefore we have 0 = d 2 D xx + b 2 D 12 + c 2 ^ 13 + d 2 Y) X4 . 

The co-factors are evidently determinants with one row 
and one column fewer than in the original determinant. 

These definitions are easily generalised. Write D = | d lx d 22 
• . . drm 1 1 where the first suffix determines the row, the 
second the column. 

Then a u D u -f- d X2 D 12 -f- ... + d ln D ln = D . . (a) 

#21 ^11 T" #22 ^12 4“ • • • "T #2" ^ln ~ 0 

a 3i D u + # 82 -f ... -f a 3n D ln == o 

• • 

a »i + a »2 Du 4- . • • + D ln = o 

For example, in the determinant D = a h g 

hb f , 

g f 0 

a (be - n + h(fg - ch) + g(hf - bg) = D 
h(bc — /*) + b( fg - ch) + f(hj - bg) = o 
g(bc — /*) + f(/g — ch) + c(hf - bg) =0 

Rectilinear Regression. 2 Variables. 

Write 

Y = aX -f b + v (i) 

and Y = y -f 5 *> X = x + x, where y and x are the averages 
of the variables y, x. 
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Assume that mean v = 0 = m.ean vx , thus expressing 
independence of v and x . v is then the residual, when Y is' 
computed from X. u 

The mean of (i) is 

y = ax + b -f o, since mean v = 0 . . . (ii) 


Subtract (ii) from (i). 

y z=z ax + v (iii) 

Multiply (iii) by x and take the mean. 

mean xy = ao x 2 + 0, since mean vx = o. 

:.r<j v = aa x (iv) 


Here a xt a v are the standard deviations of x , y and r is their 
correlation coefficient. 

Multiply (iii) by v and take the mean. 

mean vy = o + o v 2 (v) 

Multiply (iii) by y and take the mean. 

Gy 2 = arcr^y + mean vy. 

.\ a v 2 = r 2 o v 2 + (T r 2 , from (iv) and (v). 
and a v = a„ y/ (1 — r 2 ). 

Equation (i) becomes Y = y + r.^(X — x) + v. 

The reduction of the standard deviation from o y to cr v is one 
indication of how far our ’ knowledge of Y is improved by- 
estimating it with the help of X as y + r ^ (X — x). 

G x 

For example, in a collection of 145 budgets of family 
expenditure, the average whole weekly expenditure per equiva- 
lent adult was found to be 16-9 = x, with u x = 7*20 (shillings). 
The expenditure on meat was 2*36 = y, when a y = 1*14. 

r was found to be -56. 

The regression equation — meat on total expenditure — is 
therefore 

Y = 2-36 + .56 x pg (X - 16-9) + 0. 

= -09 X -f *8 7 + v. 

<r„ 2 = 1*14 V (1 - •56*) = *95. 

The reduction in standard deviation is not great, but the 
distribution of v is less unsymmetrical and more nearly normal 
than that of y. 
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Distributions of Y and of v Compared with Normal Distribution . 




Normal. 


Y 

V 


Between 


18-9 

+ 

— 

+ 

— 

0 and 

± i<r 

20 

17 

20 

29 

± £<* 

± f ® 

16-9 

8 

27 

17 

20 

± ia 

± 0 

137 

9 

21 

12 

9 

db a 

± 4® 

9-8 

10 

10 

8 

8 

i T 5 a 

± io 

6-3 

4 

6 

4 

3 

dz go- 

d= 2 a 

3.6 

6 

0 

3 

1 

± 2a 

± 3 CT 

3-i 

5 

1 

4 

6 

± 3® 

00 

0-2 

1 

0 

1 

0 



7 2 -5 

63 

82 

69 

76 


[Allen and Bowley : Family Expenditure , p. 82.] 

Rectilinear Regression . n Variables . 

Let a variable be related to variables x 2t x 3 . . . # n , all 
measured from their averages, by the equation 

— X 1 = #2*2 + *3*3 + • • • + 0n*» + V, (i) 

where v is a residual such that o = mean vx 2 

= mean vx 3 . . . — mean vx n * 

Multiply (i) by x l and take the mean. 

— a x 2 = a 2 r 12 c x o 2 + a 3 r 13 er x a 3 + . . . + a n r ln a x a n + mean vx x (ii) 
where a x , q 2 are the standard deviations of x 1$ x 2 . . ., and 
r X2 is the coefficient of correlation between x x , x 2 and so on. Of 
coursfe, r x2 = r 2 ,, etc. 

Multiply (i) by v and take the mean. 

— mean vx x = o + o + . . . + o + a 2 . . . . (iii) 
Multiply (i) by x 2 and take the mean. 

— ^I2°\ a 2 = a 2 a 2 2 + a 3 T 'l3°2 a 3 + • • • + 0»*W»i + 0 

•\ *21 • °\ + 02^2 H“ a 3 a 3 r 23 • • • + a n a n r in = O 

Similarly 

*31 • a l + a 2°2 r 32 + a 3 a 3 + • • • + &n°n r 3n = 0 > . (iv) 

*» 1<*1 4* a 2°2 r ni + a 3°3 r n3 + • • • + &n a n = 0 . 

and from (ii) 

* These conditions are equivalent to those obtained by the Method of Least 
Squares, where 2 v* is minimised (see p. 452 - 4 ). 

II 
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‘ f I 

“h &2 a 2 r l2 "f" a 2 a Z r 13 “t" • • • H* ^n a fX \n ^ ZT 


= — ■ a, 8 , from (iii) 

a i 


Write D f = 


i r u r 13 ... r lK 

r tl 1 r M • • • r » 


r n l r n2 r n3 


. . . I 


t Written R on p. 407. 

Then the values of a 2t a 2 
a 


are given by 

4*2^2 U 2 U 3 [ 

f\ *~ TA T\ ‘ • • * 

Ult 


D, 


D, 


D, 


'll ^12 J “ / 13 ^1 n 

for if these are substituted in equation (iv), we have 
*fl • ^11 + * • D 12 + + • • • r 2f JDll» = O 


r fl 

'ai 


• ^11 + *82 * ^12 H" ^ia + • • • *3n^l» — 0 


* (V) 


r ni • D u 4~ r »2 • D 12 + r*3 . D ls + ... i . D ln — o 
which are identically true from equations (/?) „P- 479- 
Equation (i) may now be written 

*1 - - & O12 + ^ 8 D 13 -f ... + ~D ln ) + v 

(p. 408, last line, where R 12 = D 12 , etc.). 

Also from (v), 

tf* 2 = * D n + r 12 D 12 + r i8^is + • • • + 


= . D, from equation (a) p. 479. 

t'n 




y t J D 

1 'Du 


** = -**- 


When « = 3, D = 1 + 2r M r„r M - r 23 8 - r 31 8 - r l2 8 

‘ =s 1 * l 1 r *S 2 ) "t" f is( , 'i3 r 81 ^12) 

+ f 13 ( r i2 r 23 — r is) 

Du = I r ig 8 , D u = f 23 r 81 D 1S = fx 2 r »8 r is 

r MJ=lt£ p ,^x t + ^- 3 — ^ + ® 

I ^28 ^2 -t *23 ^3 

(p. 400 (1 15) ). 

° r «* = <7 l*( I + 2r *a r »\ r lt — r S3* f »l* r ii i )l( 1 f 2J 4 )’ 

When»=2,D = 1 — r 12 *=i + r l2 ( — r 12 ),D u = i,D u = — r u 

X x — r ->^x t + V, a* 8 = ff r 8 (l - r u *)Tr I 

a * 1 


as before. 
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•Supplement IX. — Frequency of the Second Moment in 
* Small Samples. 

(See pp. 416-421.) 

Great importance is attached in modem work to equation 
^i) in the following, and it is well to show its relationship to 
the formulae on pages 417, 420, 421 and 451. 

Let x v x 2 ... x n be tt quantities selected at random 
from a normal group whose average and second moment are 
zero and tn 2 . 

• n 

Write S x% = nx, and Sx? — nx 2 — w^ 2 . 

t-i 

Required the chance of obtaining x, t u 2 . 

The chance that ... x t to x t + dx : .. . should be drawn is 

H _ 1 £ , 

(2irm 2 ) %.e ‘ ... dx, ... 

_» wi 1 ~ n u 

= ( 27 rw 2 ) >■« !»>.« dx, . . . 


^ n ^ 

= C . F(*) . e dxi ... dx, ... dx n , 

Here F(*) is the normal error function with standard 

deviation Vtn 2 /n (p. 290 (41) ). ju 2> and therefore — ju 2 is a 

m 2 

homogeneous quadratic function of n quantities, or, if x is 
given, of n — i quantities, and is analogous to X 2 in p. 493 
below. Using the same transformation as there, we have 
Chance of obtaining n 2 with a given x is 

K.F(*)e 3m - WJh) ^vVi=K,.F(£)<! d/u 2 , where K 

and Kj are constants. 

Hence the chance of obtaining fi 2 for any x is found by 
integrating F(*) from — 00 to + 00 , and is 

» /t w-3 

P = Kg . e s »*. ’ . ju 2 » dfi 2 , where K 2 is constant . . . (i) 

[Here P ’ jfc " which 15 Pearson’s 

Type III. Cf. page 344 (84), with 

b 0 = b % — 0 , y = P, x = d>i = — 2 mjn, a = — (» — z)tnjn.] 



• • (ii) 
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Write M, for the t ,h moment about the origin of this curve, 
M 0 being its area. 

M 0 . M 4 - e~* u p+t du = K s [— r V +8 ]" +(p + 3)M a M 0 °. 

= (P + 3)M 8 M 0 * 

= (similarly) (p + 3) (P+2)M 2 M 0 =(p+3) (p+2) (/>+i)M 1 M 0 . 
= (P + 3 )U> + 2 ) (p+i)pM 0 . 

^1 = P‘ M a = P(P + i)» M s = p(p + 1) (p -f- 2), 
M« — P(p + 1) (/> + 2) (£ + 3). 

Referred to the average M a ' = p, M a ' = 2 p, M 4 ' = 3 (P 2 -\-zp) 

(P- 251). 

*1 = M a ' H- M,'* = 2/Vp. K t = M 4 ' -r- Mj' a = 3 (1 + |) 
Hence the average of (average u) = = 

n 1 

— — »» 2 . (Cf. P- 342, note 1); the standard deviation of 

fh = = m 2 V»(( r — ^)- ( Cf - P- 4*7 (121), where » is 

large), and the measures of skewness and kurtosis are the same 
for/ij and for u > viz. 

2\/2 / 4 \ 

“■ " jprr,’ *» = v + iri-J - si 1 + 

Notice that equation (i) involves m 2 , the second moment 
of the universe, which is unknown in the ordinary process of 
sampling. 

It is found, however, that the ratio of two values of 
from two samples has a frequency independent of m 2 , and this 
is the basis of “ variance ” analysis. We have not, however, 
got over the limitation that the original group is normal, 
which is very important when n is small. When n is great, 
however, we know from the analysis pp. 450-2, that the 
distribution tends to normality, whatever the original group, 

with standard deviation — w 2 2 )^, where m 4 and m 2 are 

from the original group. In that analysis we should write 
a t = Xt k — kjut-xXt, since the moment is computed from the 
average of the sample. 

Also when n is great, Type III tends rapidly to normal 
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distribution, as is suggested by the values of k 1 and ‘given ’ 
above. To verify t^iis, write v = (u —p) 4- Vp, thus referring 
(u) its average, with standard deviation as unit. 


• After some reduction we have P = K 4 . e vVP (i + 

frhere K 4 = 1 lV&r> if P is integral. 




/ v V 2 . V 3 \ 

-vVP+(f-i) (vp~¥p +i pvp~) 


V X) 

neglecting terms of order p, in comparison with — 

I — l v# / K \ 

P = — ^1 — ~(v — i v 3 )j, the second approxima- 


tion to the normal equation (p. 295. Cf. p. 345 (86)), since 
J*! = 1 /Vp, and /cj 2 is negligible. 

[See R.S.S. Journal, 1931, Irwin, pp. 284-6 ; Econometrica, 
1935, Fisher, pp. 353-5; and the references there given, for 
the development and use of equation (1).] 


Supplement X. — Standard Deviations of Percentiles, 
Mean Deviation, and Mean Difference. 

(With p. 417.) 

Percentiles . 

Let y — f(x) be a continuous frequency curve ranging from 
h to k, such that J [ f(x) dx = i. 

Write P = \lf{x)dx = proportion of cases below value x, 

so that x is the (ioo/>)th percentile of the distribution. 

Then dp = f(x)dx = ydx , and the increase in the value of x 
corresponding to a small increase in p is approximately 

Ap (I) 
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• # 

In a sample of n objects, let the observed proportion below 
x be p + A/>. In repeated samples the average of A p is zero/ 
and mean # 

(A/>) 2 = p(i — p)/n * (2)' 

Then, if n is great, the relation of the error in * obtained* 
from the sample to A p is given by (i), and therefore from (2) 

= p( 1 — p)/ny 2 , where p o x is the standard deviation of 
the errors of x deduced from the sample. 

Thus for the (ioo/>)th percentile f a x — ^^/(— is the 
standard deviation in the scale reading. ! 

If the frequency curve is normal y = — ™ e~if. 





In normal 

curve 

Percentile. 

P- 

V(P(i -P) 

1 * 

a 

ya pff, 

a 

Vn 

•1 or '9 

•3 

1-281 

•176 

I-7I 

•2 or -8 

•4 

0-842 

•280 

i-43 

Qua r tiles 

•433 

0-6745 

•3i7 

1-36 

•3 or 7 

■458 

0-5233 

‘347 

1-32 

•4 or -6 

•490 

0-2533 

•386 

1-27 

Median 

•5 

0 

•399 

1-25 


Thus the standard deviation of the median is i-25<r/\/tt 
approx., that is very nearly £ of the standard deviation of the 
average (p. 289 (38) ). 

To find the standard deviation of the interval observed 
between percentiles 100 p and 100 q symmetrically placed, 
arrange the divisions thus 

Proportions found p + dp x 1 q — p — dp x — dp % p -f dp 2 

Range hto x l | x x to x 2 x 2 to k. 

Then mean dp x . dp 2 = — p 2 ln, from p. 419, line 8. 

The interpercentile distance is x 2 — x x — u (say), and we 
can argue as before that y .du = dp l + dp 2 , where y is the 
ordinate corresponding to either percentile. 

.‘.y 2 a. 2 =mean{(^ 1 ) 2 +((f/> 2 ) 2 +2# 1 . dp 2 }={pq+pq— 2 p t )jn 

where a % is the standard deviation required. 

• Value of 2 corresponding to £ — p on p. 271. f 

t y/{p{i—p)) -r >0. Ail values in the last three colums are approximate. 
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For example, for the iqterquantile range a u = 1/2 yVh* 

* For the probable error (p), that is half the interquantile 
tange, a p = 1/4 yVn, = • y86a/Vn , if the distribution is normal, 
= i-i8p/V n t sincep = *6745(7 (p. 272 (26)). 

while, also in the normal curve, the standard deviation of the 
standard deviation is given by 

oa = o/Vvn = *70 7<r/Vtt (p. 420 (127)). 


[Compare Yule — /In Introduction to the Theory 0/ Statistics , 
PP. 337~8 and 343.] 

Mm Deviation . 

Consider the mean deviation, rj u , about a central position u . 
The deviations of a frequency group, all taken positively, 
form a frequency group ranging upwards from zero, whose 
average is r] u , and second moment about zero is given by 

S(* — u) 2 = cr 2 + (u — x) 2 , where a and x are the standard 

deviation and average of the original group. 

/. Standard deviation of the new group is 

vV + (« - *) 2 - V 2 ) = s, 

say. Hence, in repeated samples of n each, the standard 
deviation of the average, that is of tj u , is s/%/w (see p. 289 (38)). 

If the deviations are taken originally from the average, 
u = x, and a a tl = y/{{a 2 — V 2 )/^- 

For the normal curve 7 7 = ay/^l^), (p. 269 (24)), and 
aP rj =? *603 a/Vn = 756 rj/Vn, approximately. 

In the second approximation to the normal curve the 
distance from average to median is of order i/Vn (p. 444 
(145) ), and when i/n is neglected, the standard deviation 
of the mean deviation from the median is very nearly the 
same as a a r . 


Mean Difference (see pp 114-5). 

The standard deviation of the mean difference for the 


normal curve is given by a Q — 


_? A 

Vn\ 3 



= -Soya/Vn — -yi^glVn approx., since = rj\/2 — 2a! \/ tt. 


£Bowley and Wold Congres International des Mathema- 
ticians, Oslf), 1936]. 
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' Supplement XI. — Standard Deviation of the Correlation 

Coefficient. , 

(With p. 422.) 1 # 


It is worth while to evaluate in a simple case the complete • 
formula for a f given in the last line of p. 422. 

Let x = u x + u 2 + . . . + w m , y = v x + v 2 + . . . + v m> all 
the variables being measured from the averages. 

Let each u and v have the same standard deviation a and 
fourth moment ju v and with = (3 4- €)<j 4 , where e is zero if 
the curve is normal. 

Let the us be uncorrelated and the v’s be uncorrelated. 

Let u t and v t be uncorrelated, so that 
Mean u s v t = o = mean u s v t 3 . Mean u s 2 v t 2 = cr 4 s * t 

Let each pair ( u ti v t ) be correlated by the same frequency 
distribution, so that mean u t 3 v t = (3 p + d)o l = mean u t v? 
and mean u?v? = (2 p 2 + 1 + 8)0^. 
where mean u t v t = pa 2 . 

d and S are zero if the correlation surface is normal. 

(p. 362 (106) ). 

Then with the notation of p. 422 
A = M 20 * = ma 2 = M 02 = fi, 

Where m is the number of independent w’s or v’s. 

M = M n * = sum mean u,v, — mpa 2 — p\, and r —p 
A 4 — 3 A 2 = M 40 * - 3 A 2 = m(ju t — 3a 4 ), from p. 292 


= Wa 4 « = — A 2 e = u. — 3 u, 2 
m * i 

M S1 * = mean (S«, s X Sv,) 

= S (mean (u,*v t )) + 3 S (mean u, 2 . u,v t ) 

+ zero terms. 

= w( 3 p + d)a* + 3 m(m — l)a 2 . pa 1 

. Mai O 4_ - — Mj 3 

' • AM “ 3 + m r ~ ,M’ 

M 22 * = mean (Su , 2 x Sv 2 ) 

= S (mean (iify, 2 )) + 2 S (mean (ufv, 2 ) 

-f 4 S (mean (u,n,v,v t )) -f zero terms. 

= m (2 p 2 -f- 1 + 8 )<t 4 + tn (m — i)a 4 -f 2tn (tn — i)p 2 o* 


. m 22 

'• A* 


= 2r 2 + 1 + 

tn 


• These are obtained by multiplying appropriate powers of x and y and 
taking the means 
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Substitute these values in the formula (p. 422) 

r £{( 2 + ? +** + i - 3 - 3 + + \) 


Hence, if m (the number of independent elements) is con- 
siderable, or if the frequency curves and surfaces of the w’s, v‘s 
and pairs of u,v, approach normality, the correction to the 
formula (1 — r 2 )/Vn is slight, and in any case is of the order 
1/2 tn. But, if m is small, the correction may be considerable 
and be either positive or negative. 

Other cases, of a slightly more complex kind, are worked 
out in a Note in the Journal of the American Statistical Associa- 
tion, 1928, pp. 31-4, of which the above is a modified version. 


Supplement XII.— The Method of Confidence Belts. 


(With pages 412 seq.) 


Many statisticians wish to be independent of any hypothesis 
about d priori probability, when they draw inferences from 
the results of sampling. The method of confidence belts or 
fiduciary limits, introduced by Prof. R. A. Fisher, has this 
purpose, and we proceed to describe one aspect of it in the 
simplest case. 

Write II (p,x) = „C x .p x q n ~* for the chance of obtaining x 
white balls in n independent selections from an urn containing 
white and black in the proportion p : q(p + q — 1), where p 
remains unaffected by the drawing. Let pn be sufficiently 

great to allow us to write II =^= . where z={x—pn)lVpqn. 

Choose some limit, say -05, and find from the normal Table 
(p. 271) the value of 2 that makes 2 j U .dz — *05, approx. 


X 

This value is 1*96 . . . , so that - = p ± 1 
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Ih*any experiment n is known. Take the case » = ioo, 
and compute x/n for a series of values of p. 


p- 

x/n. 

P- 

x/n 

0 

0 

1-0 

1 

•I 

•1 ± -06 

■9 

•9 ± • 06 

•2 

•2 ± -08 

■8 

•8 ± -08 

•3 

•3 ± -09 

■7 

7 ± -09 

•4 

•4 ± '096 

•6 

•6 ± 096 

•5 

•5 ± -i 




Take rectangular axes, OX, OP and plot the values of p and 
x/n. We obtain two curves (in this case ellipses) enclosing a 
space, which is called a “ confidence belt.” 

Suppose x to be found be 20 in one experiment. Mark 
OM = -2, and through M draw a perpendicular to OX to 
intersect the ellipses in L and K. 

ML and MK are the values of p given by 

•2 = p ± -196 Vpqfn , 

that is, -1-3 and -29 approx. 

Now suppose that there are a number of urns from which 
the drawing may have been made, and that the proportion 
of urns, in which the ratio of white to black is p x : q v is P x , so 
that S(P X ) = 1, the sum being extended over all values of p 
from 0 to 1. We know nothing about P x ; it may vary con- 
tinuously in any way, or be located at one or more particular 
points. 

The chance of choosing an urn (p 1 : qj and drawing # White 
balls is x U(p x x) ~ Z. The sum of Z over all values 
p = 0 to 1, and x = o to n is unity. 

Conceive a surface determined by the extremities of lines 
perpendicular to the plane XOP equal to Z, and suppose a 
vertical cylinder through the ellipses bounding the belt of 
confidence. 

This cylinder includes -95 of the area of every vertical 
section perpendicular to OP. For example, it includes the 
values of Z standing on HJ, the plane through AB, drawn 
through p x = OA = *4 (AH = -304, AJ = *496, from the 
little table above). 

The area of this section is P l x S"_ 0 ^{Pi x ) = Pi X 1 = F,, 
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which may have any value from 0 to i, but in every cafe the* 
"proportion of that value included is -95. Therefore the pro- •* 
.portion of the whole volume bounded by the surface which 
is included in the cylinder is -95, however P x may be distributed. 



0 I 2 3 4 5 6 7 9 9 10 X 


5 - iv » 100 

rU 

Diagram E. 

For any one drawing of n balls in which x = OM^ white 

balls are found, obtain the points L and K.* From this result 
write down the hypothesis, “ p the unknown proportion in 
the urn from which the drawing took place is between ML and 
MK.” We have no means of testing the accuracy, or prob- 
ability of the truth, of this hypothesis in any one case. But 
if drawings are repeated an indefinitely large number of times, 
till we may assume that all the urns are engaged in the pro- 
portions P 1 *s, and from each of them the frequency of the 

♦ KL =* ly/{ 4» • ~ ^ -r* ( n + /*)# where / is written for the 1*96 in 

the equation above. KL diminishes as y/n increases Its maximum is at 
#•*= }n, for^any given value of n and equals (»// 2 + 1)“* . 
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resulting x's is given by the appropriate II , then the theoretical 
. distribution of Z = P x x II will in the very long run tend to 
be reached. Since 95 per cent, of all the results are within, 
the confidence belt, the hypothesis written above will be justified 
in 95 per cent, of an indefinitely large number of experiments, 
whatever the distribution of the urns. 

But we can make no statement whatever about the prob- 
ability of p from a single drawing or trial, however great n 
may be, if we have no reasonable knowledge of, or reasonable 
assumption about, the universe of urns. (Of course a series of 
confidence belts can be drawn corresponding to various fiduciajy 
limits; *05 is taken only for numerical illustration). 

A result of this kind has its use when a great number of 
experiments have been made, and we do not depend on the 
accuracy of the individual estimates. But if we have only 
one sample, such as was obtained in the New Survey of London 
Life m and Labour , any action (such as the supply of sufficient 
milk to children) must be based on that one ; any confidence 
we have in our results is based on n being large, and on the 
validity of the assumption as to d priori probability indicated 
on p. 414. The main assumption there is that the frequency 
of the occurrence of “ urns," with proportions differing signifi- 
cantly from the central region, is not overwhelmingly great, 
and this may in many cases be known from general experience 
of the “ populations " with which we are concerned. 

[The method of approach in the above and the framework 
of the diagram are largely based on " The Use of Confidence or 
Fiducial Limits Illustrated in the Case of the Binomial," 
Biometrika , Dec. 1934, C. J. Clopper and E. S. Pearson. The 
reader is also referred to Dr. Ney man's paper in the R.S.S . 
Journal , 1934, pp. 589-93. The general ideas came from 
Professor R. A. Fisher's work, but it is not to be assumed that 
he would accept the exegesis here offered without at least 
serious qualifications.] 
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• Supplement XIII. — The Test of " Goodness of Fit." 

t 

(With page 431.) 


Since it is now difficult to refer to Pearson’s analysis given 
in 1900, it may be well to indicate the steps in it without 
attempting a complete proof. 

^ 2 £ 2 ^ 2 

From the expression for X 2 , viz. : — + — -f , one 

tWj tn 2 m n 

of the e's may be eliminated, since e 1 + e 2 + . . . -f- e n = 0. 
X* is then a homogeneous quadratic expression in n — 1 
quantities, and can therefore be expressed by linear trans- 
formations as the sum of n — 1 squares, and written 
X 2 = Uj 2 + « 2 2 + . . . + wV, 

Then P = I x 4- I* where 

I x = Jj . . . Ke -* + ... du x du % . . . 

integrated over the range of u’s which make S(« 2 ) as great as 
an assigned X 2 . 

The process of integration is similar to that for finding the 
volume of a sphere. 

Take n — 4. 

Write — X sin 9 cos <f>, « 2 = X sin 9 sin <f>,u a = X cos 9 . 
Then w x 2 + u 2 2 + « 3 2 = X 2 . 

I x = T /* f Ke-^’ .d<f>. sin 9 . d 9 X 2 dX 

• f 00 - *** 

= 4 nK e X 2 .iX. 

J x 

By analogy in « — i dimensions, 1^ = KJ X ^ X . dX. 


Integrate this by parts 


r -* x * n-3-lx, /*» -*x* «— 4 

i.I x = L-e .X J x +(»-3)j/ X dX 


-*x* n— 3 

* .X + 


( -|x* n— 6 f 00 n— 6 ^ 

(n - 3){ e X + {n - 5)J X X dX), and so on. 
If n is even, the final integral is J x e~ 1 *' dX = /(X), say. 
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* * T -Jx“ r «— 6 

Then ^ . I x = e { X + (n — 3)X 

+ (n- 3 )(« - 5) X + . . . +(» - 3 ) ... 5 .3 . r Xj 
+ (n - 3) . . . 3 • i/(X) 
and . I( , = (« -3) . . . 3 . i^/|, since /(o) = 

Reverse the order of the expansion of I x , and divide by I 0 , 
and we obtain the equation at the bottom of p. 430. 

When n is odd the final steps are simpler and we have 
the equation (131). ' 

Write £X 2 = v and n = 2m -f 1. 

The formula for n odd becomes 


-(i +v + 


+ ® + ^ + i)i) 

= P 0 + P x + . • . Pm-!, in the notation of p. 285 (29). 

In such a series continued indefinitely, the greatest term is 
e ~ 9 . v 9 l(v)\, and since the series is the limit of a binomial expan- 
sion, the mode (given by this term) may be expected to be less 
than the median by approximately $ko, (p. 444 (145)) which 
= 1, when a — 's/v, k — 1 \/v (p. 285 (30) (31)). 

If v = m — 1, we have the greatest term, and v = m gives 
the median approximately. 

Hence P = \ for some value of |X 2 between m — 1 and tn, 
that is for some value of X 2 between n — 3 and n — 1. 

This is a rough explanation of the cause of the relation 
stated in the first line below the Table on p. 431. It caii, of 
course, be verified by a full Table of n, X s and P. 


Effect of Restrictions in Sampling . 

The linear relations between the tf's (p. 429 (130)) has the 
same effect as reducing the number of independent variables 
from n to n — 1, and results in the index n — 2 in the expres- 
sion for P. Every additional linear relation reduces the index 
one unit further, since it makes possible the elimination of one 
more e . If there are c additional relations the n at the head 
of the first column of the Table on p. 431 must be interpreted 
as n — c ; e.g. if n = 10 and c == 2, P = *54 corresponds to 
X 2 =6. 
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Thus in a contingency table (p. 373) it is often* asstfmed • 
that the sub-totals, n x , n 2 . . .,m v m 3 . . . are not subject 
Jo variation. When this is so, and l and c are the numbers of 
lines and columns in the table, we have c — 1 and / — 1 
additional linear relations between certain of the e’s. If the 
sub-totals are variable, n, the number of compartments, = c X l, 
and the index of X 2 is cl — 2 ; with the restrictions it is cl — 2 
— (c — 1) — (/ — 1) = cl — c — l. ( R.S.S . Journal, 1022, 
R. A. Fisher, p. 88). 

. Thus in the case c = 2, / = 2, P = f _ e ~ iX ‘ . X°dX, 

• J X ’ IT 

NV 

where X 2 = — , as in fact is given on p. 372.* If, how- 

ever, the ratios of the sub-totals were not known, but were based 
on the observations of one random group taken from a larger 
universe we should have X 2 instead of X°. 

Here we are in the difficulty that unless we know the pro- 
portions in the universe, we have not the data for calculating 
a,)8 ... on which X 2 depends. 

A similar question arises when we calculate the constants 
of a frequency curve from the sample to which it is to be fitted. 
Thus in the example on pp. 274-5 we may assume p = 
a = 3*535, and the only doubt about the test is how fine the 
grading of the results should be; but in the examples on 
pp. 304-6, we can only determine x and a from the observations. 

A considerable difference of opinion has existed on the 
question whether this process involves additional restric- 
tions. It appears to depend on the hypothesis made as to 
the method of repeated sampling, where in fact we have only 
one sample. 

The different hypotheses are : — 

1. Suppose a great number of samples to be taken from the 
same universe, and that they are not restricted to have the 
average and other moments of the given sample. Then there 
is no restriction, but we do not know the universe. We can 
compute X 2 and P for any universe we like to define ; among 
the results from a universe of the given form (such as normality) 
which is to be tested, we may choose that which minimises 

4 * But there the chance of a positive deviation (or excess) is taken, so that 
we have JPu 
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X 2 ; r then we could say that the chance that the sample would 
arise from the universe which fitted it best is P, with no restric- 
tion. In Professor Karl Pearson's words ‘‘what we actually 
do is to replace the accurate value of X 2 , which is unknown* 
to us and cannot be found, by an approximate value," when we" 
select the constants on this or any other principle. If the 
number of objects in the sample is large X 2 will vary, under 
ordinary choice, only by a quantity comparable with 
which in fact has already been neglected (p. 454). 

2. It is. supposed that samples are drawn again and again 
and X 2 is computed for each sample by equating the moments 
in the universe to those in that sample. Then the samples 
are not supposed to be drawn from an invariable universe. 
In such a case the analysis of pp. 429-31 does not apply, for 
the m ’ s are variable; we have a number of single examples 
for a series of values of X 2 . 

3. The samples are drawn from the same universe, and X 2 
is calculated subject to one or more linear restrictions. The 
index in the integral is then reduced. The resulting P = 
function X shows the distribution of chances of samples 
restricted to definite moments or other constants, that is from 
a universe in which certain quantities are taken as known. 

Hypothesis 3 is certainly appropriate in a contingency 
table where the sub-totals are in fact known. In other classes 
of cases we are entitled, in my opinion, to choose the universe 
in any way we please, with or without reference to values 
based on the sample, and without modifying the universe. 
The reader may perhaps be helped to form a judgment by 
considering Gibrat’s graphic method (p. 471). If we choose 
x 0 so as to get optically the least curvature in the line represent- 
ing the observations and then read off a and b and compute 
X 2 , have we any reason to say that the sample is taken under 
the condition that x 0 , a and b restrict it ? Or is that an unfair 
way of stating hypothesis 3 and its results ? 

[See Fisher, loc. cit ., and in Economica , 1923, pp. 139-147; 
Pearson, Biometrika , 1922, pp. 186-91 ; Bowley and Connor, 
Economica , 1923, pp. 1-9, and the references there given.] 
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Absolute Errors, see Errors 
Abstract, Annual, of Labour Statistics, 
11, 5411., 55 n., 16311., 197 
Statistical, for the United King- 
dom, 11 

Accuracy, 5, 130, 178 seq. 

of Comparisons, 193, 326 seq. 

of Statistics of International 

Trade, 50 n. 

Ages, 26, 107, 128, 130, 235; Dia- 
gram, facing 130 

Agricultural Wages, 75 seq., 84-5, 
86-7, 90 seq. 

A priori probability, see Inverse 
probability 

Arithmetic Average orMean, 82 seq. ,84 
Frequency Groups, 248, 253 seq. 
Mean Cube of Error of, 287 seq. 
Normal Distribution of, 290, 314-15 
Precision of, 312 seq., 415-16 
Relative Error in, 183, 319-20 
Relative Error in Ratio of, 186, 
326, 446-7 

Standard Deviation of, 287 seq., 
300-1, 342 n., 418 seq., 452 
Association, 370 
Coefficient of, 370 
Asymmetry, 116 ; see Skewness 
Attributes, 19, 53, 259 seq., 330 seq., 
367 seq., 412 seq. 

Averages, 7, 16, 82 seq. 

Applications of, 117 seq. 

Examples : 

Measurements of Boys, 105 
Train Service, 117 
Wage Statistics, 118 seq., 126 
Graded Data, 85 
as Rates, 83 

Significance of Differences between, 
329 seq. 

see Arithmetic Averages, Geo- 
metric Mean, Median, Mode, Mov- 
ing Averages, Weighted Averages 
KK 


Bernoulli's Laws, 273-4 

Biassed Errors, 190 seq., 199 

Binomial Expansion (p -j- q)° • 

Deduction of the Normal Law, 
261 seq., 301 

Birth Rates, 95 

Blank Forms, 15, 23, 24, 28. 39 
Specimens of, 22, 32, 33, 40. 45, 46, 
49 

Block Diagram, 130 

Example : Ages of Married Men, 
130 

Board of Trade Index, 201 seq. 

British Association Index-Number, 
206-7 

Budgets of Expenditure, 189, 210, 
480 

Cartograms, 141 

Census : 

Population, 18, 20 seq., 57 seq., 
128, 281, 313, 402 
Production, 27, 51 
Wage, 12, 30, 32 seq., 70 seq., 
89-90, 103 

Central Difference Formulae, 228-9, 
240-1 

Chance and Experience, 272 seq. ; see 
Probability 

Characteristics, 19, 53, 259 seq., 330 
seq., 367 seq., 412 seq. 

Coefficient, of Association, 370 
of Colligation, 370 
of Contingency, 374/ 3 79 
of Correlation, 334 seq. 

Standard Deviation of, 422-3, 
452, 488 

Partial Correlation, 400 
Partial Regression, 400 
of Regression, 362 

Standard Deviation of, 423 
Statistical, 94-5 
ol Variation, 116 
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Coll6ct£on«of Material, 14, 15, 18 seq. 
Examples : 

Foreign Trade Statistics, 43 seq. 
Frencn Wage Statistics, 37-8 
Population Census, 20 seq. 
Unofficial Investigation (Liveli- 
hood and Poverty), 39 seq. 
Wage Census, 30 seq. 

Colligation, Coefficient of, 570 
Comparisons of Averages, 193, 326 
seq. 

Comparisons of Series of Figures, 
149 seq., 172 seq., 378-9; see 
Correlation 
Examples : 

Foreign Trade, 15 1 seq. 

Marriage Rate and Employment, 
I74“5 

Marriage Rate and Foreign 
Trade, 155 seq. 

Marriage Rate and Price of 
Wheat, 155 seq. 

Compensating Fluctuations, 148 
Concentration, 462 
Confidence Belts, 489-92 
Consumption, Index-Numbers, 212- 

„ I3 . 

Contingency, 371 seq., 374 seq. 
Coefficient of, 374, 379 
Constancy of sub-totals, 495 ; maxi- 
mum, 379 

Correlation, 62, 350 seq., 380 seq. ; see 
Partial and Multiple Correlation 
Examples : 

Heights and Weights of Children, 
385-6 

Imports and Marriage Rates, 
386 seq. 

Infantile Mortality and Popula- 
tion, 381-2 

Occurrences of pairs of digits, 
384-5 

Pairs of totals of letters, 388 
seq. ; Diagram, 390 
Selection of digits at random, 

381 

Size of herrings and number of 
rings, 383 seq. 

Coefficient of, 354 seq. 

Standard Deviation of, 422-3, 
452. 488 

Normal Correlation Surface, 356 
seq. 

Comparison with Observations, 
391 seq. 

Second Approximation, 396-7 
Ratio, 365 seq., 366, 379 
of Ranks, 477 

of Time Series, 155, 342 n., 374 
seq., 386 seq., 467 
of Ungraded Variables, 367 seq. 
Variate Difference, 376-7, 388 


Correspondence between Data and 
Formulae, 426 seq., 454, 493-6 
Cost of Living, 189, 208 seq., 2x3 
Curve ; 

of Error ; see Error * 

of Regression, 352 
Curves of Frequency, 247 . 343 seq. 
see Error, Curve of 
Logarithmic, 169 seq. 

Subsidiary, 221 
Cycles of Trade, 148, 164 

Data, see Collection of Material and 
Graded Data 

Data and Formulae, Correspondence 
between, 426 seq. 

Death Rates, 95, no seq. 

Death Rates, Makeham's Formula, 
34^-9 

Deciles, 102 ; see Examples on 
Averages, Application of 
Demography, 7, 20 
Density, Greatest, g8 ; see Mode 
Derived Functions and Finite Differ- 
ences, 224-5 

Determinants, Note on, 478 
Deviation, 104, no ; see Mean, 
Quantile, Standard Deviation 
Diagrams : 125 seq. ; see List, xi 
Examples : 

Imports and Population, 145 
seq. 

Revenue Statistics, 143 seq. 
Historical, 142 seq. 

Pictorial, 139-40 

Difference, Mean, 114, 461-4; stan- 
dard deviation of, 487 
Differences, see Finite Differences 
Differences between Averages, Signi- 
ficance of, 329 seq. 

Dispersion, no sea., 248-9 

Example : Deatn Rates, 1 ic seq. 

Earnings, 34 seq. ; see Wages 
Economist Index-Number, t2, 205-6 
Employment, see Unemployment 
Error, Law and Curve of : 
Applications of the Law of Error, 
312 seq. 

to Sampling, 277 seq. 

Area of Curve, 268 
Deduction of the General Law of 
Error, 291 seq. 

Professor Edgeworth's proof, 
295 seq. 

Proof by the Multinomial 
Theorem, 291 seq. 

Diagram, facing 454 
Examples : 

Comparison of Results of 
Experiments of Chance with 
Normal Distribution, 274 seq. 
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• 

Error, Law and Curve of (cant.) : 

Fitting of Normal Curve and 
Second Approximation , 304 

, ^ seep, 3 M- 3 I 5 

Krirtosis, 455 

Limited Universe, or Selections not 
Independent, 282 seq., 300 
Limit of Binomial Expansion 
(p 4* q)°» 261 seq., 301 
Mean deviation, 269 
Mean difference, 464 
Probable error, 270 
Second Approximation, 267, 295, 
302 

Diagram, 443 

• Moments and Constants, 441 seq. 
Standard Deviation of Average and 
Standard Deviation, 421 
Table of Areas, Normal Integral, 
271 

Second Approximation, 303 
Transformations of, 470 seq. 

Error of Mean Square, see Standard 
Deviation 

Error, Probable, see Probable Error 
Error, Absolute, in Weighted Sums 
and Averages, 316-17 
Biassed and Unbiassed, 190 seq., 
199 

Relative, 180 seq., 318 seq., 446 
seq. 

Euler-Maclaurin Theorem, 436 seq. 
Examination of Results, 14, 16 
Exports, see Foreign Trade 

Finite Differences, 222 seq. 

and Derived Functions, 224-5 
Fitting of Normal Curve and Second 
Approximation, 304 seq., 314-15 
Fluctuations : 

Compensating, 148 ; see Periodic 
Fluctuations 
Random, 148 
of a Series in Time, 148-9 
Undulatory, 148 

Force of Mortality, Makeham’s 
formula, 348-9 

Foreign Trade, 43 seq., 132 seq., 145 
seq., 151 seq., 155 seq., 170-1, 173, 
201 seq., 234, 386 seq. ; Diagrams, 
facing 134, 146, 152, 155, 171 
Forms of Enquiry, see Blank Forms 
French Wage Census, 37-8 
Frequency Curves, 247, 343 seq. ; see 
Error, Curve of 

Frequency Groups, no, 246 seq. 
Central Position, 248 
Description of, 248 
Dispersion, 248-9 
Kurtosis, 455-6, 484 
Measurement of, 246 seq. 
Symmetry «nd Asymmetry, 249 


Geometric Mean, ioj - 8 , 105* 
Goodness of Fit, 426 seq., 454, 493-6 
Graded Data, 85, 113, 130, 247 
Sheppard's Corrections for, 253, 
439 sCq. 

Graphic Methods, 125 seq., 378-9 
for Interpolation, 219 seq., 231 
Great Numbers, 8 
Law of, 287 seq., 298 
Groups, see Frequency Groups 
Groups, Limits of, 66 

Histograms, 130 

Example : Ages of Married Men, 
130 

Historical Diagrams, 142 seq. 

Imports, see Foreign Trade 
Incomes, Pareto’s Law, 346 seq., 
460, 462 

Index-Numbers, 171, 196 seq. 

Board of Trade Index, 202 seq. 
British Association Index, 206-7 
Consumption, 212-13 
Cost of Living, 208 seq. 

Economist Index, 205-6 
Sauerbeck’s, 171, 198, 205-6, 254. 
324 seq. 

Wage Statistics, 213 
Interpolation, 214 seq. 

Example: Rates of Wages, 215-17, 
218 

Algebraic Treatment, 221 seq. 

Numerical Examples, 233 seq. 
Correction of Observations, 237 
Graphic Method, 219 seq., 231 
Periodic Figures, 220-1 
Subsidiary Curves, 221 
Inverse or A. priori Probability, 409 
seq., 489 

Kurtosis, 455-6, 484 

Labour Statistics, Annual Abstract of, 
11, 54 n., 55 n„ 163 n., 197 
Lagrange’s Interpolation Formula, 
229, 235-6 

Law of Error, see Error 

Great Numbers, 287 seq., 298 
Small Numbers, 284 seq. 

Law of Proportional Effect, 473-4 
Least Squares, 137, 239, 364, 452 seq., 
481 

Livelihood and Poverty, 10, 39, 402 
Logarithmic Curves, 169 seq. 

Example : Foreign Trade Statistics, 
1 70-1 

Logarithmic Mean, ioj, 205 
Logarithms, Table of, 176-7 
Logistic Curve, 468 seq. 

Makeham’s Formula, 348-9 
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♦ Maps* Statistical, 141 
Marriage-Rate, 95, 156, 174, 338-9, 
386 seq.; Diagrams, facing 155, 
% *74 

Material, Collection of, 14, 15, 18 
seq. 

Maximum Ojdinate, see Mode 
Mean Cube Deviation, 249 
of a Sum or Average, 289 
Mean Deviation, 111, 270 n., 455, 
456-9, 461-4 
of Normal Curve, 269 
Standard Deviation of, 487 
Mean Difference, 114, 461-4 
Standard Deviation of, 487 
Mean Square Deviation, see Standard 
Deviation 

Means, see Averages 
Median, 102 seq., 459, 461, 462, 463, 
470, 472, 475 

Graphic Method, 106, 138-9; Dia- 
grams, facing 106, 138 
Examples : Extract from Railway 
Time-Table, 106-7 
Examples : Ages of Married Men, 
107 

American Wage Statistics, 138-9 
as Index-Number, 206 
Interpolation of, 227, 236-7 
Standard Deviation of, 485-6 
Method of Least Squares, 137, 239, 
364, 452 seq., 481 
Misfit, Test of, 426 
Mode, 95 seq., 139, 248 

Examples : U.S.A. Wage Statistics, 
96 seq., 139 
Heights of Men, 99 
Interpolation of, 228, 237 
Modulus, 252 
Moments, 250 seq. 

Examples : 

Random Selection of Digits, 
256-7 

Right Ascension of the Pole 
Star, 255 

Sauerbeck’s Index-Numbers, 254 
Table of Chances, 255 
Weights of Boys, 253 
and Constants of Second Ap- 
proximation to the Curve of Error, 
441 seq. 

of the Correlation Surface, 

361-2 

of Law of Proportional Effect, 


474 

of Normal Curve, 269 

of Translated Curve, 471 

Standard Deviation of, 420, 

450 seq. 

Moving Averages, 132 seq., 163 
Example : Exports Statistics, 132 
seq. 


Multinomial Theorem, 292 n..; Proof 
of Law of Error, 291 seq.* 

Multiple Correlation, 403 seq. 

Newton’s Interpolation Formula? • 
226, 234 

Normal Correlation Surface, see Corre- • 
lation 

Normal Law and Curve of Error, see m 
Error, Law and Curve 
Numbers, Great, see Great Numbers 
Law of Small, 284 seq. 

Occupation, 27 seq., 61 
Official Statistics, 10 

r 

Parabolic Equation, 225, 230-1 * 
Pareto's Equation, 346 seq., 460, 462 
Partial Correlation, 398 seq. 

Coefficient, 400 

Examples : 

Constitution of Family and 
Expenditure on Food, 400 seq. 
Constitution of Family and 
Number of Rooms, 402 
Constitution of Family, Income 
and Rent, 402-3 

Partial Regression Coefficients, 400 
Pearson's Frequency Curves, 344 seq.. 
Type III, 483, 485 
Example : Fitting of Type III, 
310 

Test for Goodness of Fit, 427 

seq. ; Table, Value of X 1 , 431 
Examples, 432-3 
Percentage, 83 
Misfit, 426 

Percentiles, 102, 472, 475; standard 
deviation of, 485 ; see Median, 
Quartiles, Deciles 

Periodic Fluctuations, 148, 159 seq., 
220-1, 339 seq. 

Example : Unemployment, 160 seq. 
Pictorial Diagrams, 139-40 
Population, 25, 145 seq., 381-2 

Census, 18, 20 seq., 57 seq.. 128, 

281, 313, 402 
Powers of Integers, 434-5 
Precision of Average, 415-16 

of Standard Deviation, 416-17 

of Sums and Averages, 312 seq., 

409 seq. 

Predominant Value, 98 ; see Mode 
Prices, 65 seq., 171, 198, 201 seq., 
254, 324 seq. 

Probability, 259 seq. 

Addition of Chances, 261 
Bernoulli's Laws, 273-4 
Deduction of the Normal Law of 
Error, 261 seq. 

Examples, 273 seq. 

Inverse, 409 seq., 489*1 



INDEX 5<Jl • 


Pr6bability ( cont .) : 

Law of Small Numbers, 284 seq. 
Multiplication of Chances, 260-1 
t Standard Deviation and Mean Cube 
•of Error of a Sum and Average, 
287 seq., 300-1 

Probable Error, 113, 248; standard 
deviation of, 487 

oLNormal Curve, 270, 272 

Product, Error in, 185, 318 
Production, Census of, 27, 51 
Purchasing Power, see Index-Num- 
bers. 

Quartile Deviation, *13 ; see Prob- 
able Error 

Qhartiles, 102 ; standard deviation 
of, 486 

Examples, see Averages, Applica- 
tions of 

of Normal Curve, 272 

Quotient, Error in, 185-6, 193, 319, 
326 seq. 

Random Fluctuations, 148 

Selection, 259, 278-9 

Ranks, correlation of, 477 
Rates, 83 

Ratio, of Averages, Error in, 186, 193, 
326, 446-7, 448-9 

Correlation, 365 seq., 366, 379 

Error in, 185-6, 193,319, 326 seq. 

Rectangle Diagrams, 140 
Regression, 352 
Coefficient of, 362 

Standard Deviation of, 423 
Curve of, 332 

Equation of, 362 seq., 400, 405-6 
see Examples under Correlation 
Partial Regression Coefficient, 400 
Rectilinear, 363 seq. ; 2 variables, 
•479; n variables, 481 ; Diagram, 
390 

Relative Errors, see Errors 
Retail Price Index, 208 seq. 

Revenue Statistics, 143 seq. ; Dia- 
gram, facing 142 

Samples, 198, 206, 208 
small, distribution of second mo- 
ment, 483 

Sampling, Application of the Normal 
Law, 277 seq. 

Examples, 278, 280-1 
Selection by Strata, 332-3, 336-7 
Scale, Choice of, 132, 145-6, 149 

Logarithmic, 1 70 

Standard, 153 

Schedules, see Blank Forms 
Series, Correlation of Time, 155, 
•342 n., 374 seq., 386 seq., 467; see 
Cobaparisbn of Series 


Sheppard’s Corrections, 233,^39 seq.,* 

457 4 

Significance of Differences between 
Averages, 329 seq. 

Skewness, 116 seq., 249-50, 251-2, 
253 seq., 484 

Small Numbers, Law of, 284 seq. 
Smoothing of Curves, 132 seq. 
Examples : 

American Wage Statistics, 138-9 
Exports Statistics, 132 seq. 
Standard Deviation, 112, 249, 251 
of an Average, 287 seq., 300-1, 316, 
342 n., 418 seq., 452 
of Binomial Series, 263-4 
Precision of, 416-17 
Calculation of, 253 seq. 
of Coefficient of Regression, 423 
of Correlation Coefficient, 422-3, 
452, 488 
of Deciles, 486 
of Difference, 288 
of Interpercentile difference, 486 
of Mean Deviation, 487 
of Mean Difference, 487 
of Median, 485 
of Moments, 420, 450 seq. 
of Percentiles, 485 
of Probable Error, 487 
of Quartiles, 486 
of Ratio of Averages, 446 
of Ratio of Weighted Averages, 
448 

of Regression Residual, 480, 482 
of Standard Deviation, 420, 450 
seq. 

of a Sum, 287 seq., 300-1, 316, 
353 

Statistical Abstract for the United, 
Kingdom, 11 

Statistical Coefficients, 94-5 

Groups, no 

Maps, 141 

Statistics, Definitions, of, 3, 7, 82 

Scope of, 3 seq., 17 

Stirling’s Formula, for m I, 435-6 
Subsidiary Curves, 221 
Sum, Error in, 182, 312 seq. 

Mean Cube of Error of, 287, 

288 

of Powers of Integers, 434-5 

Standard Deviation of, 287 seq., 

3°o-L 353 
Summary, 14, 16 

Example : Wage Statistics, 122-3 
Summation and Integration, Euler- 
Maclaurin Theorem, 436 seq. 
Surface, Correlation, see Correlation 
Symmetry and Asymmetry, see 
Skewness 

of Normal Curve, 269 

Systems of Weighting, see Weighting 
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•Tables, logarithms, 176-7 

Integral of Normal Curve of Error, 
271 

Second Approximation of Curve of 
Error, 303 
Value of X*, 431 
Tabulation, 14, 15, 52 seq. 

Examples : 

Changes of Wages, 75 seq. 

Poor Law Returns, 1833, 61 seq. 
Population Census, 57 seq. 
Report on Wholesale Prices 
(American), 65 seq. 

Wage Census, 70 seq. 

Wage Statistics, 122 
of Descriptive Answers, 120 

Example : Working of Overtime, 
120- 1 

Tellers, 4, 23, 24, 28, 31 
Time Series, Correlation of, 155, 
342 n., 374 seq., 386 seq., 467 
Trade Unions, Eighth Report on, 60 
Translation, Edgeworth's Method of, 
470 

Trend, 132 seq., 137, 148, 337 seq., 

„ 465 '? 

Examples : 

Export Statistics, 132 seq. 
Marriage Rates, 338-9 
Recorded Times for the “ Oaks/’ 
338 

Unbiassed Errors, 190 seq., 199 
Undulatory Fluctuations, 148 
Unemployment, 36-7, 160 seq., 174; 

Diagrams, facing 162, 174 
Ungraded Variables, Correlation of, 
367 seq. 

Unit, Definition of, 18 seq. 

Examples : 

Foreign Trade Statistics, 43 seq. 
French Wage Statistics, 37-8 
Income Tax Commission, 19-20 
Population Census, 20 seq. 
Unofficial Investigation ( Liveli- 
hood and Poverty), 39 seq. 
Wage Census, 30 seq. 

Universe, 277 


Universe with Varying Chances, 
332-3, 336-7 

Unofficial Investigation (Livelihood 
and Poverty), 39 seq. 

Variate Difference Correlation, 
376-7. 383 

Variance, 484 

Variation, Coefficient of, 116 .. 

Wage Census, 12, 30, 32 seq., 70 seq., 
89-90, 103 

Wages, 30 seq., 63 seq., 84-5, 86-7, 
90 seq., 122, 126, 132, 138-9, 

323-4 

Changes of, 75 seq., 118 seq., 

188, 197, 213, 215 seq., 327-8; 
Diagram, facing 127 
Wallis's Theorem for the value of n, 
434 

Weighted Averages, 86 seq., 88; see 
Index-Numbers 
Examples : 

Agricultural Wages, 90 seq. 

Wage Census, 89-90 
Weighted, Averages, Absolute Error 
in, 316-17 

Relative Error in, 184-5, 3 2 ° seq- 

Examples : 

Sauerbeck’s Index-Numbers, 324 
seq. 

Wage Statistics, 323-4 

Relative Error in Ratio, 186 

seq., 327 seq., 448-9 
Examples : 

Family Budgets, 189 
Wage Statistics, 188, 327-8 

Standard Deviation of, 316 

Weighted Sum, Absolute Error in, 
316 

Standard Deviation of, 316 

Weighting, 87 seq., 202, 206, 209 
Examples : 

Wage Census, 89-90 
Agricultural Wages, 90 seq. 
Wheat Statistics, 146 seq., 156 seq. ; 

Diagram, facing 146 
Wholesale Prices, see Index-Numbers 
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